
TheBlackNoodle
New User
Jun 27, 2008, 1:51 PM
Post #6 of 9
(2185 views)
|
|
Re: [meloyelo] Match largest or smallest?
[In reply to]
|
Can't Post
|
|
Yeah, sorry about the terminology mix-up, heh. I had been focusing on alternation earlier, but then I realized I'd need to account for character classes as well. Also, I think my example may have been somewhat confusing. I'm going to post the function I've written, although be warned: it's not actually written in Perl. I'm working with Perl-style regular expressions in the Boost library for C++.
// block is the input string // matchStr is the variable expression // position and length are modified for other uses string GetSmallestMatch(string block, string matchStr, int& position, int& length) { using namespace boost; match_results<string::const_iterator> match; string result; if(block.size() == 1) { return string(); } string lastMatch = block; int size = lastMatch.size(); matchStr += "(?<!" + lastMatch + ")"; while(regex_search(lastMatch, match, regex(matchStr))) { lastMatch = match[0]; if(lastMatch.size() < size) { size = lastMatch.size(); result = lastMatch; position = match.position(); length = match.length(); } matchStr += "(?<!" + lastMatch + ")"; } return result; } Now to try to get a clearer example... The program has a string "cious". It also has some regular expression that will attempt to match that string; the program can tell what the regular expression is, but it's not sophisticated enough to understand it. I'll call this regular expression R. As before, let's say R = [aeiou]+. In this example, when R matches "scious", it will match on "iou". That's correct. However, I would also like to be able to have the smallest possible match. As far as I'm concerned here, since [aeiou]+ could match "i" or "io" or "iou", the smallest is "i" and the largest is "iou". Note that I know that R WILL match "iou", but I need to make the assumption that the match could end at any point. I need to make this distinction because R is not necessarily just a character class or something that can be specified with a quantifier. Say R = (?:ie|i|ey). Now if R matches "thief", then "ie" is the largest match and "i" is the smallest match. Similarly, if R = (?:eu|ew), then the smallest and largest matches are always the same size. Thanks for the help, guys. I'm not convinced there's any other way to do this, but if there is, that'd be awesome, haha.
(This post was edited by TheBlackNoodle on Jun 27, 2008, 1:55 PM)
|