
mhx
Enthusiast
Jul 12, 2001, 11:54 AM
Post #5 of 5
(2679 views)
|
Hi Ernie & BbBoy, I guess the regex solution will handle many cases. But you can easily build some valid HTML that will make the regex produce very funny results ;-) Take the following HTML code:
<BODY><P><BR><!-- This is a comment --> <IMG SRC="pics/rightarrow.gif" ALT="->" WIDTH=200> <A HREF="www.perlguru.com"> <FONT SIZE=2>Perl Guru</FONT><!-- /A> Ooops! --></A> It's completely valid, and the filtered output should simply be
<A HREF="www.perlguru.com">Perl Guru</A> which is also what the HTML::Parser example delivers. But if you feed this through the regex, it will result in
" WIDTH=200> Perl Guru Ooops! --> which is certainly not the wanted result. Ok, this is a constructed example, but I just wanted to point out that there is the possibility that the regex may fail even if the HTML is valid. If this doesn't bother you, it's ok. Just a small hint if you want to keep the regex: Add the s modifier to the regex, so the dot will match newline characters and thus allow tags to be spread over several lines. Alternatively, you can of course filter the newlines before applying the regex. -- Marcus
|