Home: Perl Programming Help: Regular Expressions:
Regex Newbie



Pro_4
User

Jul 6, 2001, 7:51 AM


Views: 32950
Regex Newbie

Hey,

I understand the VERY basics of regexs but i was wondering if you could point out some good tutorials on what certain things do (* + ?, etc).

Anyways down to my question. I have been trying to make it were when users type [col=green] text[/col]
It will turn the text that certain color(note i used col instead of color because i wasnt sure if this forum would translate that). See the problem is trying to keep green there but still changing the [col= ] around it.

Well thanks for the help :)


P.S. For you ppl that dont know html that well, color is done like this:

Code
<font color=green> </font>

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


Pro_4
User

Jul 6, 2001, 8:22 AM


Views: 32948
Re: Regex Newbie

Oh nm i figured it out, whether this is the best way or not i dont know:

Code
$msg = "[color=green] [/color]"; 
$msg =~ s/\[color=(\S+?)]/<font color=$1>/isg;
print $msg;

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


mhx
Enthusiast

Jul 6, 2001, 10:51 AM


Views: 32944
Re: Regex Newbie

Hi Pro_4!


In Reply To
I understand the VERY basics of regexs but i was wondering if you could point out some good tutorials on what certain things do (* + ?, etc).

If you really want to learn regexes, I recommend Mastering Regular Expression by Jeff Friedl (O'Reilly). It's sometimes hard to understand, but always fun to read.
If you're searching for some online regex course, try Japhy's book. I've read some excerpt and it seems really good to me.
And as always, there's some good material in the perlre manpage.

-- Marcus



Pro_4
User

Jul 17, 2001, 6:56 PM


Views: 32932
Re: Regex Newbie

Hey,

In my database for my forum, replies are seperated by +!+ and each field for a reply is seperated by |. The thing is with the forum as it is, if someone does a ton of +!+ in there post it messes up my forum, so i was wondering how i would go about substituting that with something, and when they go to view it, it turns it back into +!+ so it appears nothing had been changed in there post.

I tried this:
$msg =~ s/\+\!\+/@/isg;
$msg =~ s/|/0/isg;

But when i translate it back over when they view it shows up as so...
|h|i| |f|o|r| |s|o|m|e| |r|e|a|s|o|n| |i|t| |p|u|t|s| ||| |e|v|e|r|y|w|h|e|r|e|

but the +!+ get translated back to normal. Why does it make the | between every character? Maybe 0 has something to do with it or my regex is wrong...

¿Help?

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


mhx
Enthusiast

Jul 18, 2001, 1:20 AM


Views: 32928
Re: Regex Newbie

Hi Pro_4,

yes, there's a problem with your regex. The vertical bar | is the regex alternation operator. So you alternate nothing special and nothing special, and nothing special, of course, matches always. The regex engine advances one character and again finds a match for nothing special, and so on, so $msg will be filled with 0's afterwards, which of course then translate back to vertical bars. So, to have this fixed, escape the vertical bar:

Code
$msg =~ s/\|/0/isg;

But all in all, you're going to have a problem with your solution: What if someone uses @'s or 0's in his posts all the time? You're going to store them as they are, but translate them back to +!+'s and |'s.
One thing that comes to my mind right now (I guess there's plenty of better solutions) is to escape some of the special characters in your database. Test this piece of code:

Code
$msg = <<'ENDMSG'; 
Hello Pro_4 | whoever!
Here's my +!+ message +!+ for you!
== \Marcus\ ==
ENDMSG

$msg =~ s/([!=\\])/\\$1/g;
$msg =~ tr/|/=/;
print $msg;

$msg =~ s/\\(.)|=/$1||'|'/ge;
print $msg;

The first part will replace all occurrences of !, = or \ with \!, \= or \\. So all possible +!+ sequences will be converted to +\!+ and you don't have problems with these. It's not so easy with the |'s however. But since we have escaped all ='s, we can safely use = as a replacement for |. This is done with the tr/// line. The text now looks like this:

Code
Hello Pro_4 = whoever\! 
Here's my +\!+ message +\!+ for you\!
\=\= \\Marcus\\ \=\=

The next regex is for reversing these changes so you get the original message back. It will look for an escaped character OR an equal sign =. If it finds an escaped character, it removes the backslash, if it finds the equal sign, it is replaced by a vertical bar.
I'm more than sure there are better solutions, but this one was the first that came to my mind.
Hope this helps anyway.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



Pro_4
User

Jul 21, 2001, 6:52 AM


Views: 32909
Re: Regex Newbie

Thanks mhx that worked great... now for another thing.
OK ^ indicates that that is the first character in a word correct? (or was is $ i am pretty sure that is last character tho...) Anyways how would i go about sorting a list of words alphabetically and then taking the first character and dumping it into a file according to the first character. Basically what i am doing is i want to put all the a's in a file and all the b's in a different file but they need to be in alphabetical order inside those different files.

Thanks again :)

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


mhx
Enthusiast

Jul 21, 2001, 8:20 AM


Views: 32908
Re: Regex Newbie

Hi Pro_4,


In Reply To
OK ^ indicates that that is the first character in a word correct? (or was is $ i am pretty sure that is last character tho...)

No, ^ is the anchor for the beginning of the string, $ is for the end of the string. If you use the m modifier, ^ and $ refer to the beginning and end of a line, respectively. This is all pretty well described in perldoc perlre.
For your sorting problem, I'd use a hash of arrays:

Code
#!/bin/perl -w 
use strict;

my @words = qw(and or Anyone not Nervous whatever Car);
my %whash;

push @{$whash{lc substr $_, 0, 1}}, $_ for @words;

for( keys %whash ) {
open FILE, ">$_.dat" or die "cannot open $_.dat: $!\n";
print FILE join "\n", sort @{$whash{$_}};
close FILE;
}

Not that this will sort asciibetically, so uppercase letters will be sorted before lowercase letters. Anyway, all a's will be written to a.dat and so on.
Hope this helps.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



dsb
stranger

Jul 22, 2001, 11:25 AM


Views: 32901
Re: Regex Newbie


In Reply To
Oh nm i figured it out, whether this is the best way or not i dont know:


Code
$msg = "[color=green] [/color]"; 
$msg =~ s/\[color=(\S+?)]/<font color=$1>/isg;


This regex only converts the opening '[col]' tag. You'd have to modify it a bit to make it work on both the opening and closing tags.

This works:

Code
$msg = "[color=green]blah[/color]"; 
$msg =~ s%\[color=(\S+?)\](\S+?)\[/color]%<font color=$1>$2</font>%isg;

Hope that helps.

dan Wink


Pro_4
User

Jul 22, 2001, 6:24 PM


Views: 32897
Re: Regex Newbie

Well i just made the closing tag a seperate subsitution:

$msg =~ /\[\/color]/<\/font>/isg;

I am not sure if that is what i used exactly(just kinda assembled it in my head) but that is what i did basically.

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


dsb
stranger

Jul 26, 2001, 2:29 PM


Views: 32891
Re: Regex Newbie


Code
use Benchmark; 

timethese(100000, {
one => sub {
$str = "[col=green]text[/col]";
$str =~ s%^\[col=([^\]]+)\]([^[]+)\[/col\]$%<font color="$1">$2</font>%;
},
two => sub {
$str = "[col=green]text[/col]";
$str =~ s%^\[col=([^\]]+)\]%<font color="$1">%;
$str =~ s%\[/col\]%</font>%;
},
});

You have to be careful with regular expressions since they can eat up time and processing power if they aren't well written. If you run the code above you'll see that the first option has advantages in time(in seconds) and other system expenses.

In the second example, two regular expressions must be compiled and executed over and over again, as opposed to only the one in the first. Granted the first regular expression is more involved and takes longer to process than either of the two in example 2, but you still lose time since the two must constantly be recompiled and reevaluated.

dan Wink


Pro_4
User

Jul 26, 2001, 7:35 PM


Views: 32886
Re: Regex Newbie

Ahh thanks a lot, i will change that if it makes a big difference, although my script runs very fast as is.

Thanks :)

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];