CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner:
How to remove HTML CODE sections.



Jan 23, 2000, 7:48 PM

Post #1 of 3 (1954 views)
How to remove HTML CODE sections. Can't Post

Hello again.

Well Im at it again. I did some more reading up on CGI security and found a little piece of code to remove html code from a user input. With some modifications anyway.


Say you have a forum such as this and like this one here, you want to ensure that a user does not submit anything web code related "<a href="">" type things.

Solution: Basic one anyway.

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

$input = $CGI->param('input'};

$input =~ s/(<[^>]+> )//g;
# Removes all HTML code well anything withing <???>


If you double up the code like <<a href="">>
you get a > at the beginning of the user input or for ever how many >'s a user submits that many will show.

Work around:

I added the next line like this.
<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

$input =~ s/><//g;
# Removes additional attempts to double up code I.E <<h1>>

This works but this has now got my attention on how to combine the two sections of code together.

I have spent so much time on this I finally ordered a book on Regex. Untill It actually arives

Can anyone help explain this? and or how to do it.


Brian Hayes


Jan 23, 2000, 8:03 PM

Post #2 of 3 (1954 views)
Re: How to remove HTML CODE sections. [In reply to] Can't Post

One sulutions is to allow < > & etc.. from the user input. Just escape it into html escaped format. provides a function for this.
<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

use CGI;
$q = new CGI;

# Get escapted user input from a
# form field named formfield
$input = $q->escapeHTML($q->param('formfield'));


[This message has been edited by Borderline (edited 01-23-2000).]


Jan 24, 2000, 7:53 AM

Post #3 of 3 (1954 views)
Re: How to remove HTML CODE sections. [In reply to] Can't Post

There is a class of HTML:: modules out there specifically for parsing HTML correctly, but I'm not too familiar with it.

I've been practicing writing regular expressions to parse complex strings, and I do believe I've come up with a regular expression to match regular HTML tags. I'm still working on matching comment tags, DTD tags, and SSI tags.

This regex will match fake HTML tags, too, like <AAA href="...">, and I'm still trying to find the W3C RFC on the format in which HTML tag names -- both built-in and user specified -- can be in. This regex only allows for tag names of letters, numbers, and underscores. The attribute matching part allows for hyphens.

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

$text =~ s{

That regex matches and removes normal-looking HTML tags. Again, it's probably safer to use one of the HTML:: modules.


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives