CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
HTML stripper...

 



Nila
User


Nov 19, 2010, 11:59 PM

Post #1 of 2 (3288 views)
HTML stripper... Can't Post

Hi all,

I am writing a script in Perl for stripping the HTML code along with Javascript. It should remove the comments in each code. The file will be like,


Code
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 
<html>
<!-- testing
test-->
"<!-- test -->"
<body>
<script type="text/javascript">
document.write('<h2>This is a header</h2>');"/* testing */"
document.write('<p>/*hello*/This is a paragraph</p>'); /* sdkfjhsdfhsdfhsdjkfhsjd fhsjdh fdjs sdfdh sfjh sdfhsd jhsdf hsdf*/ /* testing*/
// hello this is a comment line
/* CHEC This too */
"/*test /*test*/test*//*hello*/"


alert("//hello"); '// This is for testing'
alert("hello"); // This is for testing'
"/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl */"
'"/* gdjkfghdf gdflkg jdfk6lgjdfkjgdfkl */'
/* hello this is multiline
multiline
comment */
</script>
<!-- fjghfdj ghjfdghjhg
fgdfgdfgklfj klfg klfd
flkgjhfd jkghf
fgfdlkgjdfg -->
<div align="center">
This is for testing.<br>
Welcome to INDIA<br>
<p> "<!-- hai comment -->" HI TESTING </p>
<strike>this for testing<br>
</strike>
<center><!-- adasdasdasdasdas --> "<!-- aksdjasdjaskdjaks"djaksdj"askd aksdjak -->" centralizing the string</center>
<input type=button name='but' value='check'/>
</body>
</html>


Desired output is,


Code
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
""
<body>
<script type="text/javascript">
document.write('<h2>This is a header</h2>');"/* testing */"
document.write('<p>/*hello*/This is a paragraph</p>');
"/*test /*test*/test*//*hello*/"
alert("//hello"); '// This is for testing'
alert("hello");
"/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl gjkdfjgdkfgjdkfgjdfjgdfg dfg fdg */"
'"/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl gjkdfjgdkfgjdkfgjdfjgdfg dfg fdg */'
</script>
<div align="center">
This is for testing.<br>
Welcome to INDIA<br>
<p> "" HI TESTING </p>
<strike>this for testing<br>
</strike>
<center> "" centralizing the string</center>
<input type=button name='but' value='check'/>
</body>
</html>


Can any one give me a regular expression to fulfill my requirement.

Thanks in advance....


(This post was edited by Nila on Nov 20, 2010, 12:01 AM)


shawnhcorey
Enthusiast


Nov 20, 2010, 8:15 AM

Post #2 of 2 (3275 views)
Re: [Nila] HTML stripper... [In reply to] Can't Post

When dealing with HTML, it best to use a module from CPAN. I use HTML::TreeBuilder.

__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives