CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Multiline matching problem

 



paxxus
New User

Oct 28, 2012, 5:58 AM

Post #1 of 6 (25660 views)
Multiline matching problem Can't Post

Hello, I'm progressively matching through a string containing many lines and I want to skip lines containing white-space or line-comments; I'm multi-line matching here so white-space includes \n. The following perl program shows how I achieved this:


Code
$s = <<END; 

// Hello

// World
xxx
END

$s =~ m,\G((//.*)?\s*)*,mgc;

print ">", $s, "<\n";
print ">", substr( $s, pos( $s ) ), "<\n";


Here $s is my string and as expected the position is at the 'xxx' after the regexp has eaten through the empty lines and comments. The output of the program is:


Code
> 
// Hello

// World
xxx
<
>xxx
<


If however I change the regexp to:


Code
$s =~ m,\G(\s*|//.*)*,mgc;


Then only the first empty line is eaten and the position is now at the first line-comment, which is not what I want. My problem is that I can't explain why the second version of the regexp doesn't work for me (as you might have guessed this was actually my first failed attempt).

Can anyone help me understand this?


(This post was edited by paxxus on Oct 28, 2012, 6:01 AM)


rovf
Veteran

Jan 22, 2013, 1:14 AM

Post #2 of 6 (16636 views)
Re: [paxxus] Multiline matching problem [In reply to] Can't Post

Shouldn't in your case the dot (.) also match a \n ? In this case you would also need the 's' modifier.


paxxus
New User

Feb 1, 2013, 12:02 PM

Post #3 of 6 (16546 views)
Re: [rovf] Multiline matching problem [In reply to] Can't Post

Hi rovf, I found out that my misunderstanding was how | is interpreted. It isn't greedy but simply selects the first branch which matches.

This was what caused it to stop after the first line.

Maybe it would work with the s modifier too - I only recently (re)learned this switch and I will keep it in mind next time I do multi-line matching.

Thanks.

/p


Kenosis
User

Feb 24, 2013, 2:22 PM

Post #4 of 6 (14819 views)
Re: [paxxus] Multiline matching problem [In reply to] Can't Post

...I want to skip lines containing white-space or line-comments...

If I understand you correctly, have you considered splitting?


Code
use strict; 
use warnings;

my $s = <<END;

// Hello

// World
xxx

// foo
yyy

// bar
END

for ( split m!\s+|//.+!, $s ) {
print $_, "\n" if $_;
}


Output:


Quote
xxx
yyy



paxxus
New User

Feb 26, 2013, 4:53 PM

Post #5 of 6 (14355 views)
Re: [Kenosis] Multiline matching problem [In reply to] Can't Post

I'm parsing through a file using progressive matching, so the split would not work well for me in that situation.


(This post was edited by paxxus on Feb 26, 2013, 4:57 PM)


BillKSmith
Veteran

Feb 27, 2013, 10:41 AM

Post #6 of 6 (14186 views)
Re: [paxxus] Multiline matching problem [In reply to] Can't Post

None of the regexs posted so far match exactly the way you expect. The following program is a very good test. Each element of the output array is the exact contents of one match. As you can see, its regex comes very close. (It fails to 'eat' the last newline before the xxx.)

I recommend that you use this method to validate any regex before putting it into production.


Code
use strict; 
use warnings;
use Data::Dumper qw(Dumper);
my $s =
"\n"
."// Hello\n"
." \n"
."// World\n"
."xxx\n"
;
print Dumper [$s =~ m{^((?://.*|\s+)?)$}mgc];

print pos($s), "\n";



OUTPUT:

Code
$VAR1 = [ 
'',
'// Hello',
' ',
'// World'
];
20

Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives