CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Match largest or smallest?

 



TheBlackNoodle
New User

Jun 26, 2008, 2:33 PM

Post #1 of 9 (4092 views)
Match largest or smallest? Can't Post

Hi,

So I've organized a regex that uses alternation so that the longest match will come out first. That makes sense.


Code
regex: [aeiou]+


Say I have the word "conscious" and I want to match the "scious" portion. When matched with the regex, it returns "iou", which is correct. However, I would also like to get the smallest possible match, which is just "i". The problem is that I can't change the original regex, but I can append to it.

Is there a good way to go about finding both the largest and smallest match?


meloyelo
User

Jun 26, 2008, 2:44 PM

Post #2 of 9 (4091 views)
Re: [TheBlackNoodle] Match largest or smallest? [In reply to] Can't Post

The smallest match in this case is just one letter, so you can use: m/[aeiou]/

This will find the first vowel in the string.


(This post was edited by meloyelo on Jun 26, 2008, 2:47 PM)


TheBlackNoodle
New User

Jun 26, 2008, 3:58 PM

Post #3 of 9 (4087 views)
Re: [meloyelo] Match largest or smallest? [In reply to] Can't Post

Yeah, I can do that, but I need this to work in any case, for any regular expression, without knowing the contents of the regular expression. (It's hard to explain, but this is necessary because I'm basically feeding regular expressions into an engine.)

I think I figured something out, but it seems really inefficient. I'm appending a negating lookbehind, which includes the last match string, onto the match expression. That's $matchStr .= "(?<!$lastMatch)". I do this until no more matches are possible.

... like I said, though, that's really inefficient. If there's a better way to do it, let me know.


KevinR
Veteran


Jun 26, 2008, 6:45 PM

Post #4 of 9 (4083 views)
Re: [TheBlackNoodle] Match largest or smallest? [In reply to] Can't Post


In Reply To
Hi,

So I've organized a regex that uses alternation so that the longest match will come out first. That makes sense.


Code
regex: [aeiou]+


Say I have the word "conscious" and I want to match the "scious" portion. When matched with the regex, it returns "iou", which is correct. However, I would also like to get the smallest possible match, which is just "i". The problem is that I can't change the original regex, but I can append to it.

Is there a good way to go about finding both the largest and smallest match?


What you have is not alternation, it is a character class, they are two different things. The smallest possible match is the first "o" in the string, not the first "i". The "+" quantifier means one or more so the first "o" matches that since it is a quantity of one. What I think you are loonking for is 2 or more: {2,} but if that is the case the smallest match is "io".


Code
$foo = 'conscious'; 
$foo =~ /([aeiou]{2,}?)/;
print $1;


prints 'io' because the ? makes the match true for the shortest match (non-greedy matching). If you want to skip single vowels (ie: 'con') you have to change your regexp. Maybe this will work:


Code
$foo = 'conscious'; 
$foo =~ /([aeiou])[aeiou]+/;
print $1;

-------------------------------------------------


(This post was edited by KevinR on Jun 26, 2008, 6:48 PM)


meloyelo
User

Jun 27, 2008, 8:47 AM

Post #5 of 9 (4067 views)
Re: [TheBlackNoodle] Match largest or smallest? [In reply to] Can't Post


In Reply To
Yeah, I can do that, but I need this to work in any case, for any regular expression, without knowing the contents of the regular expression. (It's hard to explain, but this is necessary because I'm basically feeding regular expressions into an engine.)

I think I figured something out, but it seems really inefficient. I'm appending a negating lookbehind, which includes the last match string, onto the match expression. That's $matchStr .= "(?<!$lastMatch)". I do this until no more matches are possible.

... like I said, though, that's really inefficient. If there's a better way to do it, let me know.

Maybe you should explain more about what you're doing. I'm sure we can figure out an efficient way to do what you are trying to do.

For starters -- you're passing a regular expression into a subroutine. Can you show us the code for it?


TheBlackNoodle
New User

Jun 27, 2008, 1:51 PM

Post #6 of 9 (4060 views)
Re: [meloyelo] Match largest or smallest? [In reply to] Can't Post

Yeah, sorry about the terminology mix-up, heh. I had been focusing on alternation earlier, but then I realized I'd need to account for character classes as well. Also, I think my example may have been somewhat confusing. I'm going to post the function I've written, although be warned: it's not actually written in Perl. I'm working with Perl-style regular expressions in the Boost library for C++.



Code
// block is the input string 
// matchStr is the variable expression
// position and length are modified for other uses
string GetSmallestMatch(string block, string matchStr, int& position, int& length)
{
using namespace boost;
match_results<string::const_iterator> match;
string result;
if(block.size() == 1)
{
return string();
}

string lastMatch = block;
int size = lastMatch.size();
matchStr += "(?<!" + lastMatch + ")";

while(regex_search(lastMatch, match, regex(matchStr)))
{
lastMatch = match[0];
if(lastMatch.size() < size)
{
size = lastMatch.size();
result = lastMatch;
position = match.position();
length = match.length();
}
matchStr += "(?<!" + lastMatch + ")";
}

return result;
}


Now to try to get a clearer example...

The program has a string "cious". It also has some regular expression that will attempt to match that string; the program can tell what the regular expression is, but it's not sophisticated enough to understand it. I'll call this regular expression R.

As before, let's say R = [aeiou]+. In this example, when R matches "scious", it will match on "iou". That's correct. However, I would also like to be able to have the smallest possible match. As far as I'm concerned here, since [aeiou]+ could match "i" or "io" or "iou", the smallest is "i" and the largest is "iou". Note that I know that R WILL match "iou", but I need to make the assumption that the match could end at any point.

I need to make this distinction because R is not necessarily just a character class or something that can be specified with a quantifier. Say R = (?:ie|i|ey). Now if R matches "thief", then "ie" is the largest match and "i" is the smallest match.

Similarly, if R = (?:eu|ew), then the smallest and largest matches are always the same size.

Thanks for the help, guys. I'm not convinced there's any other way to do this, but if there is, that'd be awesome, haha.


(This post was edited by TheBlackNoodle on Jun 27, 2008, 1:55 PM)


KevinR
Veteran


Jun 27, 2008, 3:08 PM

Post #7 of 9 (4054 views)
Re: [TheBlackNoodle] Match largest or smallest? [In reply to] Can't Post

If you can use capturing groups ():

(([aeiou])[aeiou]+)

the first capturing group $1 will be the longest match and the second capturing group $2 will be the shortest match.

$_ = 'cious';
/(([aeiou])[aeiou]+)/;
print "$1\n$2";

other than that I can't think of how to do it with one regexp, but maybe someone else will, regexps are not exactly my strong point.
-------------------------------------------------


meloyelo
User

Jun 28, 2008, 10:16 AM

Post #8 of 9 (4024 views)
Re: [TheBlackNoodle] Match largest or smallest? [In reply to] Can't Post

Will the question mark modifier work for you?


Code
my $str = "scious"; 
my $re = qr/[aeiou]/;

if ($str =~ m/($re+)/) { print "with +: $1\n"; }
if ($str =~ m/($re+?)/) { print "with +?: $1\n"; }



KevinR
Veteran


Jun 28, 2008, 10:21 AM

Post #9 of 9 (4023 views)
Re: [meloyelo] Match largest or smallest? [In reply to] Can't Post

He seems to want to use one regexp, otherwise I would also recommend using two. He also seems to want to only match if there is more than one consecutive vowel, your regexps will match a single vowel.
-------------------------------------------------


(This post was edited by KevinR on Jun 28, 2008, 10:23 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives