CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Regex Newbie

 



Pro_4
User

Jul 6, 2001, 7:51 AM

Post #1 of 11 (13507 views)
Regex Newbie Can't Post

Hey,

I understand the VERY basics of regexs but i was wondering if you could point out some good tutorials on what certain things do (* + ?, etc).

Anyways down to my question. I have been trying to make it were when users type [col=green] text[/col]
It will turn the text that certain color(note i used col instead of color because i wasnt sure if this forum would translate that). See the problem is trying to keep green there but still changing the [col= ] around it.

Well thanks for the help :)


P.S. For you ppl that dont know html that well, color is done like this:

Code
<font color=green> </font>

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


Pro_4
User

Jul 6, 2001, 8:22 AM

Post #2 of 11 (13505 views)
Re: Regex Newbie [In reply to] Can't Post

Oh nm i figured it out, whether this is the best way or not i dont know:

Code
$msg = "[color=green] [/color]"; 
$msg =~ s/\[color=(\S+?)]/<font color=$1>/isg;
print $msg;

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


mhx
Enthusiast

Jul 6, 2001, 10:51 AM

Post #3 of 11 (13501 views)
Re: Regex Newbie [In reply to] Can't Post

Hi Pro_4!


In Reply To
I understand the VERY basics of regexs but i was wondering if you could point out some good tutorials on what certain things do (* + ?, etc).

If you really want to learn regexes, I recommend Mastering Regular Expression by Jeff Friedl (O'Reilly). It's sometimes hard to understand, but always fun to read.
If you're searching for some online regex course, try Japhy's book. I've read some excerpt and it seems really good to me.
And as always, there's some good material in the perlre manpage.

-- Marcus



Pro_4
User

Jul 17, 2001, 6:56 PM

Post #4 of 11 (13489 views)
Re: Regex Newbie [In reply to] Can't Post

Hey,

In my database for my forum, replies are seperated by +!+ and each field for a reply is seperated by |. The thing is with the forum as it is, if someone does a ton of +!+ in there post it messes up my forum, so i was wondering how i would go about substituting that with something, and when they go to view it, it turns it back into +!+ so it appears nothing had been changed in there post.

I tried this:
$msg =~ s/\+\!\+/@/isg;
$msg =~ s/|/0/isg;

But when i translate it back over when they view it shows up as so...
|h|i| |f|o|r| |s|o|m|e| |r|e|a|s|o|n| |i|t| |p|u|t|s| ||| |e|v|e|r|y|w|h|e|r|e|

but the +!+ get translated back to normal. Why does it make the | between every character? Maybe 0 has something to do with it or my regex is wrong...

¿Help?

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


mhx
Enthusiast

Jul 18, 2001, 1:20 AM

Post #5 of 11 (13485 views)
Re: Regex Newbie [In reply to] Can't Post

Hi Pro_4,

yes, there's a problem with your regex. The vertical bar | is the regex alternation operator. So you alternate nothing special and nothing special, and nothing special, of course, matches always. The regex engine advances one character and again finds a match for nothing special, and so on, so $msg will be filled with 0's afterwards, which of course then translate back to vertical bars. So, to have this fixed, escape the vertical bar:

Code
$msg =~ s/\|/0/isg;

But all in all, you're going to have a problem with your solution: What if someone uses @'s or 0's in his posts all the time? You're going to store them as they are, but translate them back to +!+'s and |'s.
One thing that comes to my mind right now (I guess there's plenty of better solutions) is to escape some of the special characters in your database. Test this piece of code:

Code
$msg = <<'ENDMSG'; 
Hello Pro_4 | whoever!
Here's my +!+ message +!+ for you!
== \Marcus\ ==
ENDMSG

$msg =~ s/([!=\\])/\\$1/g;
$msg =~ tr/|/=/;
print $msg;

$msg =~ s/\\(.)|=/$1||'|'/ge;
print $msg;

The first part will replace all occurrences of !, = or \ with \!, \= or \\. So all possible +!+ sequences will be converted to +\!+ and you don't have problems with these. It's not so easy with the |'s however. But since we have escaped all ='s, we can safely use = as a replacement for |. This is done with the tr/// line. The text now looks like this:

Code
Hello Pro_4 = whoever\! 
Here's my +\!+ message +\!+ for you\!
\=\= \\Marcus\\ \=\=

The next regex is for reversing these changes so you get the original message back. It will look for an escaped character OR an equal sign =. If it finds an escaped character, it removes the backslash, if it finds the equal sign, it is replaced by a vertical bar.
I'm more than sure there are better solutions, but this one was the first that came to my mind.
Hope this helps anyway.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



Pro_4
User

Jul 21, 2001, 6:52 AM

Post #6 of 11 (13466 views)
Re: Regex Newbie [In reply to] Can't Post

Thanks mhx that worked great... now for another thing.
OK ^ indicates that that is the first character in a word correct? (or was is $ i am pretty sure that is last character tho...) Anyways how would i go about sorting a list of words alphabetically and then taking the first character and dumping it into a file according to the first character. Basically what i am doing is i want to put all the a's in a file and all the b's in a different file but they need to be in alphabetical order inside those different files.

Thanks again :)

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


mhx
Enthusiast

Jul 21, 2001, 8:20 AM

Post #7 of 11 (13465 views)
Re: Regex Newbie [In reply to] Can't Post

Hi Pro_4,


In Reply To
OK ^ indicates that that is the first character in a word correct? (or was is $ i am pretty sure that is last character tho...)

No, ^ is the anchor for the beginning of the string, $ is for the end of the string. If you use the m modifier, ^ and $ refer to the beginning and end of a line, respectively. This is all pretty well described in perldoc perlre.
For your sorting problem, I'd use a hash of arrays:

Code
#!/bin/perl -w 
use strict;

my @words = qw(and or Anyone not Nervous whatever Car);
my %whash;

push @{$whash{lc substr $_, 0, 1}}, $_ for @words;

for( keys %whash ) {
open FILE, ">$_.dat" or die "cannot open $_.dat: $!\n";
print FILE join "\n", sort @{$whash{$_}};
close FILE;
}

Not that this will sort asciibetically, so uppercase letters will be sorted before lowercase letters. Anyway, all a's will be written to a.dat and so on.
Hope this helps.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



dsb
stranger

Jul 22, 2001, 11:25 AM

Post #8 of 11 (13458 views)
Re: Regex Newbie [In reply to] Can't Post


In Reply To
Oh nm i figured it out, whether this is the best way or not i dont know:


Code
$msg = "[color=green] [/color]"; 
$msg =~ s/\[color=(\S+?)]/<font color=$1>/isg;


This regex only converts the opening '[col]' tag. You'd have to modify it a bit to make it work on both the opening and closing tags.

This works:

Code
$msg = "[color=green]blah[/color]"; 
$msg =~ s%\[color=(\S+?)\](\S+?)\[/color]%<font color=$1>$2</font>%isg;

Hope that helps.

dan Wink


Pro_4
User

Jul 22, 2001, 6:24 PM

Post #9 of 11 (13454 views)
Re: Regex Newbie [In reply to] Can't Post

Well i just made the closing tag a seperate subsitution:

$msg =~ /\[\/color]/<\/font>/isg;

I am not sure if that is what i used exactly(just kinda assembled it in my head) but that is what i did basically.

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];


dsb
stranger

Jul 26, 2001, 2:29 PM

Post #10 of 11 (13448 views)
Re: Regex Newbie [In reply to] Can't Post


Code
use Benchmark; 

timethese(100000, {
one => sub {
$str = "[col=green]text[/col]";
$str =~ s%^\[col=([^\]]+)\]([^[]+)\[/col\]$%<font color="$1">$2</font>%;
},
two => sub {
$str = "[col=green]text[/col]";
$str =~ s%^\[col=([^\]]+)\]%<font color="$1">%;
$str =~ s%\[/col\]%</font>%;
},
});

You have to be careful with regular expressions since they can eat up time and processing power if they aren't well written. If you run the code above you'll see that the first option has advantages in time(in seconds) and other system expenses.

In the second example, two regular expressions must be compiled and executed over and over again, as opposed to only the one in the first. Granted the first regular expression is more involved and takes longer to process than either of the two in example 2, but you still lose time since the two must constantly be recompiled and reevaluated.

dan Wink


Pro_4
User

Jul 26, 2001, 7:35 PM

Post #11 of 11 (13443 views)
Re: Regex Newbie [In reply to] Can't Post

Ahh thanks a lot, i will change that if it makes a big difference, although my script runs very fast as is.

Thanks :)

@letters = ('A'..'Z', 'a'..'z', '1'..'5', '_');
@i = ( '15', '43', '40');
print @letters [ @i, -1, 55];

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives