Jul 12, 2001, 11:54 AM
Post #5 of 5
Hi Ernie & BbBoy,
I guess the regex solution will handle many cases.
But you can easily build some valid HTML that will make the regex produce very funny results ;-)
Take the following HTML code:
It's completely valid, and the filtered output should simply be
<BODY><P><BR><!-- This is a comment -->
<IMG SRC="pics/rightarrow.gif" ALT="->" WIDTH=200>
<FONT SIZE=2>Perl Guru</FONT><!-- /A> Ooops! --></A>
which is also what the HTML::Parser example delivers. But if you feed this through the regex, it will result in
<A HREF="www.perlguru.com">Perl Guru</A>
which is certainly not the wanted result.
Perl Guru Ooops! -->
Ok, this is a constructed example, but I just wanted to point out that there is the possibility that the regex may fail even if the HTML is valid. If this doesn't bother you, it's ok.
Just a small hint if you want to keep the regex: Add the s modifier to the regex, so the dot will match newline characters and thus allow tags to be spread over several lines. Alternatively, you can of course filter the newlines before applying the regex.