CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
smallest match in regexp

 



g00dlife
New User

Mar 11, 2003, 6:32 PM

Post #1 of 4 (3552 views)
smallest match in regexp Can't Post

I need some help on simple regexp.

Just say I have a string for example:

$str = "start start start match1 finish finish finish";

I would like to extract the smallest string that is bounded by "start" and "finish"... ie. in this case I would like to extract the string "match1".

My first guess was to use:

$pat = "start (.*?) finish";

however this matches the string "start start match1".

I know I can use the pattern:

$pat = ".*start (.*?) finish";

and get what I want, however there's a performance issue by using the greedy matcher .* at the start of the pattern. (This is not important but i'm actually tackling a problem cause by the java package Jakarta ORO... big performace issues by using combinations of greedy and non-greedy matches on large strings.)

Is there another single regular expression that will extract "match1" from the above $str when you only know it's bounded by "start" and "finish".

Any help would be greatly appreciated.


Jasmine
Administrator

Mar 11, 2003, 8:29 PM

Post #2 of 4 (3550 views)
Re: [g00dlife] smallest match in regexp [In reply to] Can't Post

Yep. Use a negative lookahead assertion.


Code
$str =~ /(?<!start) (.*?) finish/;


For details, please read [url=http://www.perldoc.com/perl5.8.0/pod/perlretut.html#Looking-ahead-and-looking-behind]Looking Ahead and Looking Behind in perlretut.


g00dlife
New User

Mar 12, 2003, 5:17 PM

Post #3 of 4 (3544 views)
Re: [Jasmine] smallest match in regexp [In reply to] Can't Post

Thank you Jasmine. That solution does work with my above example... but for my problem I should have been more specific with my question.

The string should have been something like:

$str = "start start no match start start match1 finish finish";

I'm after 'match1' and all I know is that it is bounded by the strings "start " and " finish" (for the above $str example).

The question rephrased is simply: Is it possible to find the last occurance of "start (.*?) finish" in any given string without using the .* greedy matcher?

Your help is very much appreciated.


jryan
User

Mar 12, 2003, 11:05 PM

Post #4 of 4 (3541 views)
Re: [g00dlife] smallest match in regexp [In reply to] Can't Post

I'll one-up you... Here's a solution that works without the use of the inaccurate .*? token.


Code
my $string = "start start no match start start match1 finish finish"; 

my @matches = $string =~
/
start
(
s(?!tart)
| f(?!inish)
| [^sf]+
)
finish
/x;

print join ("\n", @matches)


Please refer to the link that Jasmine gave for more details.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives