CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
per regular expressions

 



nmretd
stranger

Oct 23, 2001, 2:57 AM

Post #1 of 9 (650 views)
per regular expressions Can't Post

I am trying to use a regular expression on the following string:

my $string = "abc#ccd#egg#hij#klm#nop#rst#luv#wxyx#";

The string is not a fixed length, it may be longer than this. I want to ignore all characters before the 5th Hash, and end up with:

#nop#rst#luv#wxyx#

Can someone help me please ?



mhx
Enthusiast / Moderator

Oct 23, 2001, 3:01 AM

Post #2 of 9 (649 views)
Re: per regular expressions [In reply to] Can't Post

This should do the work:

Code
$string =~ s/(?:[^#]*#){5}/#/;

Hope it helps.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



nmretd
stranger

Oct 23, 2001, 4:11 AM

Post #3 of 9 (646 views)
Re: per regular expressions [In reply to] Can't Post

Thanks marcus, that's excellent. But when I change the delimiter in the string from a hash (#) to a pipe (|) it doesn't seem to work. Do you know why ?

eg:

my $string = "abc|ccd|egg|hij|klm|nop|rst|luv|wxyx|";

$string =~ s/(?:[^|]*|){5}/|/;

print $string;





mhx
Enthusiast / Moderator

Oct 23, 2001, 5:39 AM

Post #4 of 9 (643 views)
Re: per regular expressions [In reply to] Can't Post

Because the pipe is a metacharacter in regexes (separating alternatives). You'll have to escape it when you're not using it inside a character class:

Code
$string =~ s/(?:[^|]*\|){5}/|/;

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



nmretd
stranger

Oct 23, 2001, 5:54 AM

Post #5 of 9 (641 views)
Re: per regular expressions [In reply to] Can't Post

thanks again.

cos I am new to perl I don't really understand how you've achieved this little trick, i get the bit about escaping the metacharacters but I'm not quite sure what's going on inside the regular expression:

s/(?:[^\|]*\|){5}/\|/;

Can you explain in words how this substitution expression works Please ? i.e the question mark ? colon : etc..

Many thanks.



mhx
Enthusiast / Moderator

Oct 23, 2001, 6:19 AM

Post #6 of 9 (640 views)
Re: per regular expressions [In reply to] Can't Post

Sure!

Code
s/(?:[^|]*\|){5}/|/;

The (?: ... ) construct groups the contained items without catching them. Normal parentheses ( ... ) would catch the contents, and make them available through the $1 regex variable. But we don't need the contents, just need the grouping, and (?: ... ) is more efficient in that case. But you could also write

Code
s/([^|]*\|){5}/|/;

without a change in the result. This may be more readable to the beginner.
Contained in the (?: ... ) is an expression describing what we like to match.

Code
s/(?:[^|]*\|){5}/|/;

This starts with a negated character class [^ ... ]

Code
s/(?:[^|]*\|){5}/|/;

saying that we want to match every character that is not a pipe.
The character class is immediately followed by a quantifier

Code
s/(?:[^|]*\|){5}/|/;

The * quantifier tells the regex engine to match zero or more of the preceding items. So

Code
s/(?:[^|]*\|){5}/|/;

means match zero or more characters that are not pipes. After having matched this, we want to match a single pipe. Since the pipe is a special regex metacharacter that is used for separating alternatives, we have to escape it:

Code
s/(?:[^|]*\|){5}/|/;

Now, all that stuff is grouped together using (?: ... ) and followed by just another quantifier:

Code
s/(?:[^|]*\|){5}/|/;

The {n} quantifier tells the regex engine to match the preceding item exactly n times. So the whole regex means:
Match five times the following: "zero or more non-pipe characters followed by a single pipe character".
Since we're doing a search-and-replace operation ( =~ s/search/replace/ ), we can replace what we just matched by a single pipe

Code
s/(?:[^|]*\|){5}/|/;

and this way we insert the last pipe that we originally didn't want to remove. As you can see, you only have to escape the pipe character when you're in a regular expression. You don't have to escape it in character classes (there are other meta-characters defined for character classes) and in the replace string.

Hope this makes the regex a bit clearer.
For more information, have a look at the manpages perlretut and perlre.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



nmretd
stranger

Oct 23, 2001, 7:01 AM

Post #7 of 9 (639 views)
Re: per regular expressions [In reply to] Can't Post

thanks, that's well explained. I hope it is ok if I ask you one more question.

I want to take the same string:
my $string = "abc|ccd|egg|hij|klm|nop|rst|luv|wxyx|";

and

keep the first part, ignore everything up until the 4th pipe, so I am left with:

|abc|rst|luv|wxyx|

the |abc| section may contain white spaces!

can this be done using a reg exp or do I have to use join etc..

thanks.



mhx
Enthusiast / Moderator

Oct 23, 2001, 8:10 AM

Post #8 of 9 (638 views)
Re: per regular expressions [In reply to] Can't Post

I don't quite see how the transition from

Code
abc|ccd|egg|hij|klm|nop|rst|luv|wxyx|

to

Code
|abc|rst|luv|wxyx|

matches your description. So perhaps you can review this.

However, using split and join may be helpful. If you have, for example

Code
abc|ccd|egg|hij|klm|nop|rst|luv|wxyx

and you want to take out elements 1 to 5 (numbering starts at 0), so you end up with

Code
abc|rst|luv|wxyx

Then you could use:

Code
#!/bin/perl -w 

$string = "abc|ccd|egg|hij|klm|nop|rst|luv|wxyx";
$newstr = join '|', (split /\|/, $string)[0,6..8];

print "$newstr\n";

or, which might be more readable:

Code
#!/bin/perl -w 

$string = "abc|ccd|egg|hij|klm|nop|rst|luv|wxyx";
@elem = split /\|/, $string;
splice @elem, 1, 5;
$newstr = join '|', @elem;

print "$newstr\n";

This uses the splice function to remove five elements from @elem starting at offset 1.

Hope this helps.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



nmretd
stranger

Oct 23, 2001, 8:22 AM

Post #9 of 9 (637 views)
Re: per regular expressions [In reply to] Can't Post

that's excellent, thanks for all your help.



 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives