CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Pattern Matching multiple things

 



wuntvor
Deleted

Aug 15, 2000, 3:47 PM

Post #1 of 2 (528 views)
Pattern Matching multiple things Can't Post

In cases like:

$_ = "I am a very model of a modern major general";
(/(\w+\s)*/g)
Does $1 match "I", "general", or something else, and is there a way of trapping multiply matched subexpressions though ther $2, $3, etc.

for example with html:
m#http://([^/]+)(/([^/]+/)*([^/]*))?#
And I wanted to match a list of the subdirectories: (3rd set of parans) is there a way to do that directly?


japhy
Enthusiast

Aug 16, 2000, 12:58 PM

Post #2 of 2 (528 views)
Re: Pattern Matching multiple things [In reply to] Can't Post

This should really be in the "Regular Expressions" forum, since your question (and my explanation) are far from beginner level.

When you use a regular expression like:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


"a12b34c56d78" =~ /([a-z](\d+))+/;
</pre><HR></BLOCKQUOTE>

the regular expression engine creates the following pseudo-regex code:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


Compiling REx `([a-z](\d+))+'
size 24 first at 5
1: CURLYX {1,32767}(23)
3: OPEN1(5)
5: ANYOF[a-z](14)
14: OPEN2(16)
16: PLUS(18)
17: DIGIT(0)
18: CLOSE2(20)
20: CLOSE1(22)
22: WHILEM[1/1](0)
23: NOTHING(24)
24: END(0)
</pre><HR></BLOCKQUOTE>

The most important thing is that the OPEN1 and OPEN2 mean "set what matches to the corresponding $DIGIT variable". Since these OPEN1 and OPEN2 directives are inside the equivalents of while loops, they get set (in the end) to the last time the string matches. In my example, $1 is "d78" and $2 is "78".

If you want to get all values of $1 and $2, you use m//g in list context:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


@pairs = "a12b34c56d78" =~ /([a-z](\d+))/g;
# @pairs is ("a12", 12, "b34", 34, ... )
</pre><HR></BLOCKQUOTE>

Notice how the regex doesn't have the ending + on it.

An alternative solution is:

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


$str = "alpha:beta!gamma/delta/epsilon/";
@parts = $str =~ m{[^:]+:[^!]+!(([^/]+/)+)};
# @parts is ('alpha', 'beta', 'gamma/delta/epsilon/', 'epsilon/')
# can you see why? hint: $4

# or you could do

$str = "alpha:beta!gamma/delta/epsilon/";
@parts = $str =~ m{[^:]+:[^!]+!((?:[^/]+/)+)};
# @parts is ('alpha', 'beta', 'gamma/delta/epsilon/')
# because (?: ... ) groups without setting a $DIGIT variable
</pre><HR></BLOCKQUOTE>

To read: perlre

------------------
Jeff "japhy" Pinyan -- accomplished author, consultant, hacker, and teacher


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives