CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
greedy regex

 



PGScooter
stranger

Mar 23, 2008, 7:18 PM

Post #1 of 5 (3799 views)
greedy regex Can't Post

Hi, I'm still trying to get a handle on non-greedy regexes:

In reference to this post

http://www.perlguru.com/gforum.cgi?post=30267;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed;guest=2821777

I at first wanted to use a non-greedy regex, but I could not get it to work. Here is a copy of the desired regex


Quote


Hello,,

I need to travel a root directory and its sub directories to grab all.pdf files. I have the restriction of not being able to download modules on th eunix box accept for the stadard module. so using the following line of cod eto grab the list of files:

@filelist=`find $source_dir -name "*.pdf"`;

#print @filelist;
foreach $files(@filelist)
{
print "\n $files";
}

no wthe problem i sthat I need to strip th eFilename with all the dir and sub dir in fo.



and my attempt using a non-greedy regex:


Code
$file='blah/blah/blah/wantthis.pdf';  
if ($file=~/(\/\S*?pdf)/) {
print $1;
}

i thought that this regex would just get the 'wantthis.pdf' because of the nongreedy use of the *, but it didn't workthanks
The more you teach me, the more I learn. The more I learn, the more I teach.


KevinR
Veteran


Mar 24, 2008, 10:13 AM

Post #2 of 5 (3794 views)
Re: [PGScooter] greedy regex [In reply to] Can't Post

lets look at the string:


Code
$file='blah/blah/blah/wantthis.pdf';




and the regexp:


Code
$file=~/(\/\S*?pdf)/


the search pattern is:

/\S*?pdf

that would be a forward slash '/' followed by zero or more non space characters '\$*' and match as little as possible '?' untill the pattern 'pdf' is found. So the regexp will (starting from the left and reading to the right) find the first forward slash and match as little as possible up to 'pdf', so it will match:

/blah/blah/wantthis.pdf

since 'pdf' is the last part of the string it will match everything between the first forward slash and 'pdf'. If you add more to the end of the string you will see the regexp is matching in a non greeedy fashion:



Code
$file='blah/blah/blah/wantthis.pdf/blah/blah/blah.pdf';



the regexp you posted will still only match:

/blah/blah/wanthis.pdf

To me, the real problem with your regexp is using \S which matches any non spcace characters, that is too inclusive. You should use \w which will match word characters [a-zA-Z0-9_] that way no slashes are matched. You should also include the dot in the pattern:


Code
$file='blah/blah/blah/wantthis.pdf.pdf.pdf';   
if ($file=~/\/(\w*\.pdf)/) {
print $1;
}


but notice you no longer have to use '?' in the matching if you include the dot in the search pattern because the dot is not part of the \w character class so once the first pattern of '\w*\.pdf' is found the matching stops.
-------------------------------------------------


PGScooter
stranger

Mar 24, 2008, 8:59 PM

Post #3 of 5 (3789 views)
Re: [KevinR] greedy regex [In reply to] Can't Post

thanks Kevin,

I guess I thought that non-greedy regexes matched as LITTLE as possible but obviously that is not the case. Now I understand the difference between greedy and non-greedy better; your example helped.

Is it ever good coding to reverse a string, regex it, and reverse it back? or is that poor perl-edicate? Is there always a more elegant, straight-up regex?

In this example, it would be easy to reverse the string, and just do a search matching up to the first forward slash, then reverse the match back.
The more you teach me, the more I learn. The more I learn, the more I teach.


KevinR
Veteran


Mar 24, 2008, 10:47 PM

Post #4 of 5 (3787 views)
Re: [PGScooter] greedy regex [In reply to] Can't Post

I'm far from a regexp guru, but reversing a string to find a match is rarely necessary but it is not unheard of. There is no reason to do it in this case.
-------------------------------------------------


PGScooter
stranger

Mar 25, 2008, 9:53 AM

Post #5 of 5 (3784 views)
Re: [KevinR] greedy regex [In reply to] Can't Post

Ok, thanks for the info
The more you teach me, the more I learn. The more I learn, the more I teach.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives