CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
XML RegEx Problem

 



hurz
Novice

Jun 18, 2014, 11:51 AM

Post #1 of 9 (29441 views)
XML RegEx Problem Can't Post

Hi,

I want to find with RegEx all Names with a Token in the following Xml-structure.

Result should be:
Name1, Token1, Token2
Name3, Token3

<Test>Name1</Test>
<a>
<b>Something {Token1}</b>
</a>
<c>
</c>
<a>
<b>Something {Token2}</b>
</a>
<c>
</c>
<Test>Name2</Test>
<Test>Name3</Test>
<a>
<b>Something {Token3}</b>
</a>
...


BillKSmith
Veteran

Jun 19, 2014, 12:55 PM

Post #2 of 9 (28761 views)
Re: [hurz] XML RegEx Problem [In reply to] Can't Post

You have not received an answer because we do not understand your question.

From your example I can guess at an interpretation:

  • 'Name' is the text in a <Test>..</Test> field.

  • 'Token' is the text within {..} in a <b>.....</b> field

  • A Token is 'with' a Name if it appears after the Name, but before the next Name.


  • In general it is a bad idea to process xml with a regex. Use a module to parse it first.

    A regex to implement my interpretation would be vey complex. I am not even going to try until you justify using the regex approach and verify my interpretation.
    Good Luck,
    Bill


    hurz
    Novice

    Jun 20, 2014, 3:07 AM

    Post #3 of 9 (28643 views)
    Re: [BillKSmith] XML RegEx Problem [In reply to] Can't Post

    Hello everyone,

    the interpretation of Bill is correct. In the meantime I've played a little bit with the "RegEx Coach":

    1. Names can be found with: <Test>(.*?)</Test>
    2. Tokens can be found with: \{(.*?)\}

    Problem: Is it possible to combine both expressions (1 AND 2), that only a Name (e.g. Name1) with x Tokens (e.g. Token1, Token2) will be found and then the next Name (e.g. Name3) with x Tokens (e.g. Token3) and so on?

    Thanks in advance.

    hurz

    (This post was edited by hurz on Jun 20, 2014, 4:14 AM)


    Laurent_R
    Veteran / Moderator

    Jun 20, 2014, 4:26 AM

    Post #4 of 9 (28621 views)
    Re: [hurz] XML RegEx Problem [In reply to] Can't Post

    Sure, you just need to store the name and the tokens in memory and print them out only when you get to the next name and only if some tokens have been found.


    BillKSmith
    Veteran

    Jun 20, 2014, 5:38 AM

    Post #5 of 9 (28603 views)
    Re: [hurz] XML RegEx Problem [In reply to] Can't Post

    I cannot think of a way to do the whole job with one regex. (It seems to require a variable length look-ahead assertion which is not available in perl) Another approach is to use a regex to extract each Name and all the text that goes 'with' it. Use a second regex to extract the Token(s) from each block of text.

    UPDATE: Better Solution

    An even better solution is to use perl's IO to break your input into logical blocks:


    Code
    use strict; 
    use warnings;
    local $/ = '<Test>';
    <DATA>;
    while (<DATA>) {
    my ($name) = /(.+)\<\/Test\>/;
    my @tokens = /\{(.+)\}/g;
    print "$name: @tokens\n";
    }

    __DATA__
    <Test>Name1</Test>
    <a>
    <b>Something {Token1}</b>
    </a>
    <c>
    </c>
    <a>
    <b>Something {Token2}</b>
    </a>
    <c>
    </c>
    <Test>Name2</Test>
    <Test>Name3</Test>
    <a>
    <b>Something {Token3}</b>
    </a>


    Output

    Code
    Name1: Token1 Token2 
    Name2:
    Name3: Token3

    Good Luck,
    Bill

    (This post was edited by BillKSmith on Jun 20, 2014, 7:53 AM)


    hurz
    Novice

    Jun 20, 2014, 8:56 AM

    Post #6 of 9 (28260 views)
    Re: [BillKSmith] XML RegEx Problem [In reply to] Can't Post

    First of all thank you very much for your support.

    But the used tool supports only Perl RegEx without Perl itself.


    Laurent_R
    Veteran / Moderator

    Jun 20, 2014, 10:01 AM

    Post #7 of 9 (28140 views)
    Re: [hurz] XML RegEx Problem [In reply to] Can't Post

    I think that a single pure regex is really not the appropriate tool for this job.

    Don't you have a programming language available (if only scripting or macro language)?

    I'll try later to see if I can come up with a single regex, but that seems quite difficult.


    hurz
    Novice

    Jul 13, 2014, 9:44 AM

    Post #8 of 9 (5279 views)
    Re: [Laurent_R] XML RegEx Problem [In reply to] Can't Post

    Unfortunately there's no scripting language. But it's possible to analyse the result by the preceding regular expression with a Sub-regular expression. Is this somehow feasible here? Thanks.


    BillKSmith
    Veteran

    Jul 14, 2014, 7:49 AM

    Post #9 of 9 (4038 views)
    Re: [hurz] XML RegEx Problem [In reply to] Can't Post

    We have already told you that a perl regular expression is not the right tool for this job. You are in the position of someone asking how to open a can of beans with a screwdriver. It seems possible, but not easy. There may even be a clever method that works most of the time, but sometimes produces a very undesirable result.

    Now you tell us that we can use a "Sub-Regular expression". This term is not defined in perl documentation. How can we help without knowing more about your tools?
    Good Luck,
    Bill

     
     


    Search for (options) Powered by Gossamer Forum v.1.2.0

    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives