CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
matching condundrum

 



pebreo
Novice

May 25, 2005, 5:33 PM

Post #1 of 3 (4041 views)
matching condundrum Can't Post

Hello all,

It seems that the seemingly simple problems are the ones that get you. I'm still a newbie so this is probably peanuts for the gurus here.

Background:
First, I am using ActivePerl 5.8 on a WinXP machine. I have been coding in Perl for about a week now so I'm still very green. I am trying to extract a pattern from a long string.

I have a long string with repeated patterns of
e.g.
junk [foo] fred 6,000 [/bar] junk [foo] wilma betty [/bar]

When any character inside the [foo][/bar] is homogenous no problem extracting that pattern, but when it has any numbers in it, regex doesn't want to put that atom in the $1 backreference, but instead gives me more than I want. Here's an example:

SCENARIO 1 CODE:

Code
###Version 1 - homogenous characters betwen [foo][/bar] 
# this string has homogogenous characters in-between [foo][/bar]
$string = "xxx[foo]aaaaaaaa[/bar]xxx[foo]bbbbbbb[/bar]xxx";

# match the string that has alpha characters
while($string =~ m/(\[foo]\w+\[\/bar])/sig)
{
print $1, "\n";

}


SCENARIO 1 RESULTS:

Code
# everything prints as expected 
# perl extracted my match pattern
[foo]abcdef[/bar]
[foo]ghijk[/bar]


The Problem:
But now here's my conundrum. I want my pattern to recognize instances when the meat (characters) inside the [foo][/bar] delimiters is a mixture of numbers with commas and letters but NOT just letters. I want to be able to recognize and accept only things like:


Code
[foo] - fred 6,000 blah barney  69  >= [/bar]

But NOT:

Code
  
[foo] blah betty blah wilma [/bar]


My problem is that whenever I introduce characters in-between [foo][/bar] that isn't a homogenous type like strictly alphabetical (abc) or strictly numeric (123), perl extracts more than it should.

Let me show you what I mean.
SCENARIO 2 CODE:

Code
###Version 2 - extracting comma'ed number surrounded by weird characters 
# this string has a comma'ed number surrounded by all sorts of crazy characters
$string = "xxx[foo]a>a1,300a=}[/bar]xxx[foo]bbbbbbb[/bar]xxx";

# trying to match the string that has a comma and anything around it, as long as it's within [foo][/bar] delimiter
# here i use the . pattern character because we have crazy characters surrounding the ,
while($string =~ m/(\[foo].*\,+.*\[\/bar])/sig)
{
print $1, "\n";

}


SCENARIO 2 RESULTS:

Code
# perl seems to have extracted more than my match pattern 
# Why does it extract more than it should?! This appended string doesn't even have a , in it!
[foo]a>a1,300a=}[/bar]xxx[foo]bbbbbbb[/bar]xxx


And then when I try to extract based on a number character it gives me the same results.

SCENARIO 3 CODE:

Code
$string = "xxx[foo]aa1,300aaa[/bar]xxx[foo]bbbbbbb[/bar]xxx"; 


while($string =~ m/(\[foo].*\d+.*[\/bar])/sig)
{
print $1, "\n";

}


SCENARIO 3 RESULTS:


Code
# the same as Scenario 2 results 
[foo]aa1,300aaa[/bar]xxx[foo]bbbbbbb[/bar]


Summary
So basically, whenever i try to pattern anything within the [foo][/bar] string I get more than I want. What complicates matters is that the meat in-between the [foo][/bar] wrappers can be any character but I only want to extract the meat which contains numbers with commas and some words but NOT the ones with words only.

I know this is a very long post but this problem has overwhelmed my psyche. Ahhh! I think writing it out helped me really understand the problem. I'd appreciate any suggestions.

Thanks for reading.

Best,
Paul


davorg
Thaumaturge / Moderator

May 26, 2005, 7:02 AM

Post #2 of 3 (4031 views)
Re: [pebreo] matching condundrum [In reply to] Can't Post

It's because modifiers like * and + are, by default, greedy. That is, they eat up as many characters as they can - including the next [/bar] marker.

To make them non-greedy, add a ? after the modifier.

See "perldoc perlre" for more details.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


pebreo
Novice

May 26, 2005, 8:36 PM

Post #3 of 3 (4017 views)
Re: [davorg] matching condundrum [In reply to] Can't Post

 
Thanks for the reply. That did the trick! I also used quantifiers to limit the number of characters around those brackets.

Another neat tricked I learned, too, was to comment out whole blocks using:

if(0) {
commented
}

This helped me a lot with trying out different codes and building on existing stuff that worked. And I used EditPlus texteditor so I find it easier to look at comments and variables stick out more.


(This post was edited by pebreo on May 26, 2005, 8:38 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives