CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
How to remove HTML CODE sections.

 



brian.hayes
User

Jan 23, 2000, 7:48 PM

Post #1 of 3 (827 views)
How to remove HTML CODE sections. Can't Post

Hello again.

Well Im at it again. I did some more reading up on CGI security and found a little piece of code to remove html code from a user input. With some modifications anyway.

Example:

Say you have a forum such as this and like this one here, you want to ensure that a user does not submit anything web code related "<a href="">" type things.

Solution: Basic one anyway.

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>



$input = $CGI->param('input'};

$input =~ s/(<[^>]+> )//g;
# Removes all HTML code well anything withing <???>
</pre><HR></BLOCKQUOTE>

Problem:

If you double up the code like <<a href="">>
you get a > at the beginning of the user input or for ever how many >'s a user submits that many will show.

Work around:

I added the next line like this.
<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


$input =~ s/><//g;
# Removes additional attempts to double up code I.E <<h1>>
</pre><HR></BLOCKQUOTE>

This works but this has now got my attention on how to combine the two sections of code together.

I have spent so much time on this I finally ordered a book on Regex. Untill It actually arives

Can anyone help explain this? and or how to do it.

THanks,

Brian Hayes


Borderline
Deleted

Jan 23, 2000, 8:03 PM

Post #2 of 3 (827 views)
Re: How to remove HTML CODE sections. [In reply to] Can't Post

One sulutions is to allow < > & etc.. from the user input. Just escape it into html escaped format.
CGI.pm provides a function for this.
<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


use CGI;
$q = new CGI;

# Get escapted user input from a
# form field named formfield
$input = $q->escapeHTML($q->param('formfield'));
</pre><HR></BLOCKQUOTE>

Scott

[This message has been edited by Borderline (edited 01-23-2000).]


japhy
Enthusiast

Jan 24, 2000, 7:53 AM

Post #3 of 3 (827 views)
Re: How to remove HTML CODE sections. [In reply to] Can't Post

There is a class of HTML:: modules out there specifically for parsing HTML correctly, but I'm not too familiar with it.

I've been practicing writing regular expressions to parse complex strings, and I do believe I've come up with a regular expression to match regular HTML tags. I'm still working on matching comment tags, DTD tags, and SSI tags.

This regex will match fake HTML tags, too, like <AAA href="...">, and I'm still trying to find the W3C RFC on the format in which HTML tag names -- both built-in and user specified -- can be in. This regex only allows for tag names of letters, numbers, and underscores. The attribute matching part allows for hyphens.

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


$text =~ s{
<
\w+
(
\s+
[-\w]+
(
\s*=\s*
(
"[^"]*"
|
'[^']*'
|
\S+
)
)
)*
\s*
>}{}gx;
</pre><HR></BLOCKQUOTE>

That regex matches and removes normal-looking HTML tags. Again, it's probably safer to use one of the HTML:: modules.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives