smallest match in regexp

New User

Mar 11, 2003, 6:32 PM

smallest match in regexp

I need some help on simple regexp.

Just say I have a string for example:

$str = "start start start match1 finish finish finish";

I would like to extract the smallest string that is bounded by "start" and "finish"... ie. in this case I would like to extract the string "match1".

My first guess was to use:

$pat = "start (.*?) finish";

however this matches the string "start start match1".

I know I can use the pattern:

$pat = ".*start (.*?) finish";

and get what I want, however there's a performance issue by using the greedy matcher .* at the start of the pattern. (This is not important but i'm actually tackling a problem cause by the java package Jakarta ORO... big performace issues by using combinations of greedy and non-greedy matches on large strings.)

Is there another single regular expression that will extract "match1" from the above $str when you only know it's bounded by "start" and "finish".

Any help would be greatly appreciated.


Mar 11, 2003, 8:29 PM

Re: [g00dlife] smallest match in regexp

Yep. Use a negative lookahead assertion.

$str =~ /(?<!start) (.*?) finish/;

For details, please read [url=http://www.perldoc.com/perl5.8.0/pod/perlretut.html#Looking-ahead-and-looking-behind]Looking Ahead and Looking Behind in perlretut.

New User

Mar 12, 2003, 5:17 PM

Re: [Jasmine] smallest match in regexp

Thank you Jasmine. That solution does work with my above example... but for my problem I should have been more specific with my question.

The string should have been something like:

$str = "start start no match start start match1 finish finish";

I'm after 'match1' and all I know is that it is bounded by the strings "start " and " finish" (for the above $str example).

The question rephrased is simply: Is it possible to find the last occurance of "start (.*?) finish" in any given string without using the .* greedy matcher?

Your help is very much appreciated.


Mar 12, 2003, 11:05 PM

Re: [g00dlife] smallest match in regexp

I'll one-up you... Here's a solution that works without the use of the inaccurate .*? token.

my $string = "start start no match start start match1 finish finish"; 

my @matches = $string =~
| f(?!inish)
| [^sf]+

print join ("\n", @matches)

Please refer to the link that Jasmine gave for more details.