CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Substitution within tags, multiple times

 



dmittner
Novice

Jul 18, 2002, 3:14 PM

Post #1 of 4 (3418 views)
Substitution within tags, multiple times Can't Post

I'm working on modification to a BBS and I want to add code/code tags which stops the substitution of other tags within them. This is what I have so far:


Code
    while ( $$text =~ /\[code\].*?\[*?\[\/code\]/i || $$text =~ /\[code\].*?\].*?\[\/code\]/i ){    
$$text =~ s!\[code\](.*?)\[(.*?)\](.*?)\[/code\]!\[code\]$1\&\#091;$2\&\#093;$3\[/code\]!i
}



This seems to work fine if I only have one instance of the tags, but multiple ones cause everything between the very first open tag, and the very last close tag, to be ignored; including other open and close tags.

Suggestions?


(This post was edited by dmittner on Jul 18, 2002, 3:18 PM)


jryan
User

Jul 20, 2002, 2:31 PM

Post #2 of 4 (3400 views)
Re: [dmittner] Substitution within tags, multiple times [In reply to] Can't Post

First of all, that use of symrefs like that is terrible. Symrefs are evil; they'll do nothing but cause you loads of problems down the road. You probably want to use a hash instead. Please see http://perl.plover.com/varvarname.html for more details.

Next, your code above is waaaaay to complex in some respects, and hardly working in others. Lets take a look:


Code
    while ( $$text =~ /\[code\].*?\[*?\[\/code\]/i || $$text =~ /\[code\].*?\].*?\[\/code\]/i ){     
$$text =~ s!\[code\](.*?)\[(.*?)\](.*?)\[/code\]!\[code\]$1\&\#091;$2\&\#093;$3\[/code\]!i
}


Notice that you are matching against the same text twice; once in the loop, the next in the substitution. Too much work, if you ask me. Let the /g modifier do most of the work for you.

You are also trying to do too much at once. The general gist of what you need to do is: Find some code, then escape the brackets. Thats 2 steps, and you are trying to do it with one. Thats going to lead to some pretty confusing code. Lets break it up a bit; first, we'll find the code, and then do the substitutions on that code.

But first, lets add a few regex definitions to make your code more readable:


Code
my $header = qr { \[    code \] }xi; 
my $footer = qr { \[ \/ code \] }xi;


Now lets work on structuring your regex. What we really want to work with is the data inside the code tags, so lets start with that. This "code" is described the following way:


Code
Captured code 
(preceded by a header)
(followed by a footer)

or

(header) <-- (captured code) --> (footer)


translated into a regex:


Code
$text =~ s/ (?<= $header ) # preceded by a header 
($code) # captured code
(?= $footer ) # followed by a footer
/process_code($1)/gex;


Next, we need to describe what the code really is. Code is:


Code
A string of: 
non-brackets or
backlashed brackets or
bracketed text thats not &#91 /code]


or translated into a regex


Code
my $code   = qr { 
(?:
# string of non-brackets
(?> [^\[]* )
|
# backslashed brackets
(?: (?<= \\) . )
|
# bracketed text thats not &#91 /code]
(?: (?! $footer ) \[ )
)*
}x;


The only thing to do now is process the text. Pretty trivial - just a simple substitution of brackets for their escaped values.


Code
sub process_code 
{
my($subst) = @_;
$subst =~ s! \[ ! &#91 !gx;
$subst =~ s! \] ! &#93 !gx;
return $subst;
}


To sum that up, we end up with:


Code
# definitions 
my $header = qr { \[ code \] }xi;
my $footer = qr { \[ \/ code \] }xi;
my $code = qr {
(?:
# string of non-brackets
(?> [^\[]* )
|
# backslashed brackets
(?: (?<= \\) . )
|
# bracketed text thats not &#91 /code]
(?: (?! $footer ) \[ )
)*
}x;

# globally substitute our text
$text =~ s/ (?<= $header ) ($code) (?= $footer )
/process_code($1)/gex;

# "processing"; substitute escaped values for brackets
sub process_code
{
my($subst) = @_;
$subst =~ s! \[ ! &#91 !gx;
$subst =~ s! \] ! &#93 !gx;
return $subst;
}


Of course, this code doesn't allow for nested &#91code]tags. Your markup langauge seems very similar to an sgml-type language; you might be better off using something like HTML::Parser or one of its clones.


dmittner
Novice

Jul 22, 2002, 9:38 AM

Post #3 of 4 (3393 views)
Re: [jryan] Substitution within tags, multiple times [In reply to] Can't Post

Well.. I actually managed to figure out with the help of one of my collegues, though it varies a bit compared to your example. Here's what I ended up with:


Code
  

while ( $$text =~ /\[code\].*?\[\/code\]/i ){
$$text =~ s!\[code\](.*?)\[/code\]!\%\[\%code\%\]\%\%\%REPLACE\%\%\%\[\%/code\%]\%!i;
my $replace = $1;
$replace =~ s!\[!&#091;!gs;
$replace =~ s!\]!&#093;!gs;
$replace =~ s! !&nbsp;!gs;
$$text =~ s!\%\%REPLACE\%\%!$replace!gs;
}



It might be doing a bit more work, but it's more compact and is done with less variables. And as it's in an area with dozens of other substitutions, compact is nice. I can't say it's fullproof, as it hasn't been taken through extensive testing, but it seems to be displaying things correctly so far. If it does cause problems, though, I'll probably use your example.



Thanks


jryan
User

Jul 22, 2002, 3:01 PM

Post #4 of 4 (3390 views)
Re: [dmittner] Substitution within tags, multiple times [In reply to] Can't Post

Compact with less variables isn't always the best solution. Notice that you are still matching multiple times (3, in fact) to perform something thats really 1 substitution.

If your massive amounts of substitutions are getting confusing, perhaps its time to start partitioning them into categories and making those into subroutines. Even ignoring the symrefs (please read that link I posted above), this code will be a maintenance nightmare. I recommend at least factoring out the symrefs, but, its your code...

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives