CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Regex, Extract between tags, multiple times

 



Lectrician
New User

Dec 27, 2015, 6:21 AM

Post #1 of 5 (1669 views)
Regex, Extract between tags, multiple times Can't Post

Hi.

I use the below code to extract text in a string from between the tags:

$string = "<result>101</result>";

$string =~ s~<result\>(.*?)\</result\>~~isg;
$found = $1;

$found in this case would be "101".

I now have a string that contains multiples of the same tag, so:
$string = "<result>101</result><result>202</result><result>303</result>";

Using the same regex, I will only ever retrieve the last entry. I want all 3.

I tried putting them in array, so:

@all = $string =~ s~<result\>(.*?)\</result\>~~isg;

$all[0] gives the last found value ($1).
$all[1] gives the number of occurances.

How can find the three values? I would like the 101, 202, 303 to each be an element in an array, or be in a string such as $string = "101|102|103";

Regex tends to drive me mad, lol.

Thanks.
Merry Christmas.


Zhris
Enthusiast

Dec 27, 2015, 8:01 AM

Post #2 of 5 (1664 views)
Re: [Lectrician] Regex, Extract between tags, multiple times [In reply to] Can't Post

Hi,

Is there any reason you are performing a substitution? A substitution returns the number of substitutions made, or false if none. A match in list context returns the sub-expressions that matched, which is more likely what you need.


Code
@all = $string =~ s~<result\>(.*?)\</result\>~~isg; 
@all = $string =~ m~<result\>(.*?)\</result\>~isg;
my @all = $string =~ m~<result>(.*?)</result>~isg;


Merry Xmas to you too.

Chris


Lectrician
New User

Dec 27, 2015, 12:24 PM

Post #3 of 5 (1654 views)
Re: [Zhris] Regex, Extract between tags, multiple times [In reply to] Can't Post

Ah.

So simple, missing the obvious.

No, I have confused myself over S and M. I have had the code running for years fine using S (when it should of been M).

I ended up today splitting the string by new line into an array, then running the regex. Darned fool! lol.

Thanks.


BillKSmith
Veteran

Dec 27, 2015, 12:31 PM

Post #4 of 5 (1654 views)
Re: [Lectrician] Regex, Extract between tags, multiple times [In reply to] Can't Post

Chris's solution will certainly work for your test case. Beware that mark-up text tends to become more complicated. (e.g. The solution will fail if the value contains a newline.) Such special cases may be rare and relatively easy to fix after they are found, but you will be finding (and fixing) them forever.

The time you spend searching for a CPAN module to do the parsing will not be wasted. Even if you decide to go with a DIY method, your research will probably have alerted you to special cases.
Good Luck,
Bill


Zhris
Enthusiast

Dec 27, 2015, 2:35 PM

Post #5 of 5 (1647 views)
Re: [BillKSmith] Regex, Extract between tags, multiple times [In reply to] Can't Post

You are absolutely right, I second your statement. Just noting that there is an s modifier in place, therefore . will also match newlines.

Chris

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives