CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions: Re: [meloyelo] Match largest or smallest?: Edit Log



TheBlackNoodle
New User

Jun 27, 2008, 1:51 PM


Views: 4982
Re: [meloyelo] Match largest or smallest?

Yeah, sorry about the terminology mix-up, heh. I had been focusing on alternation earlier, but then I realized I'd need to account for character classes as well. Also, I think my example may have been somewhat confusing. I'm going to post the function I've written, although be warned: it's not actually written in Perl. I'm working with Perl-style regular expressions in the Boost library for C++.



Code
// block is the input string 
// matchStr is the variable expression
// position and length are modified for other uses
string GetSmallestMatch(string block, string matchStr, int& position, int& length)
{
using namespace boost;
match_results<string::const_iterator> match;
string result;
if(block.size() == 1)
{
return string();
}

string lastMatch = block;
int size = lastMatch.size();
matchStr += "(?<!" + lastMatch + ")";

while(regex_search(lastMatch, match, regex(matchStr)))
{
lastMatch = match[0];
if(lastMatch.size() < size)
{
size = lastMatch.size();
result = lastMatch;
position = match.position();
length = match.length();
}
matchStr += "(?<!" + lastMatch + ")";
}

return result;
}


Now to try to get a clearer example...

The program has a string "cious". It also has some regular expression that will attempt to match that string; the program can tell what the regular expression is, but it's not sophisticated enough to understand it. I'll call this regular expression R.

As before, let's say R = [aeiou]+. In this example, when R matches "scious", it will match on "iou". That's correct. However, I would also like to be able to have the smallest possible match. As far as I'm concerned here, since [aeiou]+ could match "i" or "io" or "iou", the smallest is "i" and the largest is "iou". Note that I know that R WILL match "iou", but I need to make the assumption that the match could end at any point.

I need to make this distinction because R is not necessarily just a character class or something that can be specified with a quantifier. Say R = (?:ie|i|ey). Now if R matches "thief", then "ie" is the largest match and "i" is the smallest match.

Similarly, if R = (?:eu|ew), then the smallest and largest matches are always the same size.

Thanks for the help, guys. I'm not convinced there's any other way to do this, but if there is, that'd be awesome, haha.


(This post was edited by TheBlackNoodle on Jun 27, 2008, 1:55 PM)


Edit Log:
Post edited by TheBlackNoodle (New User) on Jun 27, 2008, 1:55 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives