CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner:
Pattern Matching multiple things



Aug 15, 2000, 3:47 PM

Post #1 of 2 (750 views)
Pattern Matching multiple things Can't Post

In cases like:

$_ = "I am a very model of a modern major general";
Does $1 match "I", "general", or something else, and is there a way of trapping multiply matched subexpressions though ther $2, $3, etc.

for example with html:
And I wanted to match a list of the subdirectories: (3rd set of parans) is there a way to do that directly?


Aug 16, 2000, 12:58 PM

Post #2 of 2 (750 views)
Re: Pattern Matching multiple things [In reply to] Can't Post

This should really be in the "Regular Expressions" forum, since your question (and my explanation) are far from beginner level.

When you use a regular expression like:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

"a12b34c56d78" =~ /([a-z](\d+))+/;

the regular expression engine creates the following pseudo-regex code:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

Compiling REx `([a-z](\d+))+'
size 24 first at 5
1: CURLYX {1,32767}(23)
3: OPEN1(5)
5: ANYOF[a-z](14)
14: OPEN2(16)
16: PLUS(18)
17: DIGIT(0)
18: CLOSE2(20)
20: CLOSE1(22)
22: WHILEM[1/1](0)
23: NOTHING(24)
24: END(0)

The most important thing is that the OPEN1 and OPEN2 mean "set what matches to the corresponding $DIGIT variable". Since these OPEN1 and OPEN2 directives are inside the equivalents of while loops, they get set (in the end) to the last time the string matches. In my example, $1 is "d78" and $2 is "78".

If you want to get all values of $1 and $2, you use m//g in list context:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

@pairs = "a12b34c56d78" =~ /([a-z](\d+))/g;
# @pairs is ("a12", 12, "b34", 34, ... )

Notice how the regex doesn't have the ending + on it.

An alternative solution is:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

$str = "alpha:beta!gamma/delta/epsilon/";
@parts = $str =~ m{[^:]+:[^!]+!(([^/]+/)+)};
# @parts is ('alpha', 'beta', 'gamma/delta/epsilon/', 'epsilon/')
# can you see why? hint: $4

# or you could do

$str = "alpha:beta!gamma/delta/epsilon/";
@parts = $str =~ m{[^:]+:[^!]+!((?:[^/]+/)+)};
# @parts is ('alpha', 'beta', 'gamma/delta/epsilon/')
# because (?: ... ) groups without setting a $DIGIT variable

To read: perlre

Jeff "japhy" Pinyan -- accomplished author, consultant, hacker, and teacher


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives