
Nila
User

Nov 19, 2010, 11:59 PM
Post #1 of 2
(1976 views)
|
|
HTML stripper...
|
Can't Post
|
|
Hi all, I am writing a script in Perl for stripping the HTML code along with Javascript. It should remove the comments in each code. The file will be like,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <!-- testing test--> "<!-- test -->" <body> <script type="text/javascript"> document.write('<h2>This is a header</h2>');"/* testing */" document.write('<p>/*hello*/This is a paragraph</p>'); /* sdkfjhsdfhsdfhsdjkfhsjd fhsjdh fdjs sdfdh sfjh sdfhsd jhsdf hsdf*/ /* testing*/ // hello this is a comment line /* CHEC This too */ "/*test /*test*/test*//*hello*/" alert("//hello"); '// This is for testing' alert("hello"); // This is for testing' "/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl */" '"/* gdjkfghdf gdflkg jdfk6lgjdfkjgdfkl */' /* hello this is multiline multiline comment */ </script> <!-- fjghfdj ghjfdghjhg fgdfgdfgklfj klfg klfd flkgjhfd jkghf fgfdlkgjdfg --> <div align="center"> This is for testing.<br> Welcome to INDIA<br> <p> "<!-- hai comment -->" HI TESTING </p> <strike>this for testing<br> </strike> <center><!-- adasdasdasdasdas --> "<!-- aksdjasdjaskdjaks"djaksdj"askd aksdjak -->" centralizing the string</center> <input type=button name='but' value='check'/> </body> </html> Desired output is,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> "" <body> <script type="text/javascript"> document.write('<h2>This is a header</h2>');"/* testing */" document.write('<p>/*hello*/This is a paragraph</p>'); "/*test /*test*/test*//*hello*/" alert("//hello"); '// This is for testing' alert("hello"); "/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl gjkdfjgdkfgjdkfgjdfjgdfg dfg fdg */" '"/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl gjkdfjgdkfgjdkfgjdfjgdfg dfg fdg */' </script> <div align="center"> This is for testing.<br> Welcome to INDIA<br> <p> "" HI TESTING </p> <strike>this for testing<br> </strike> <center> "" centralizing the string</center> <input type=button name='but' value='check'/> </body> </html> Can any one give me a regular expression to fulfill my requirement. Thanks in advance....
(This post was edited by Nila on Nov 20, 2010, 12:01 AM)
|