CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Matching HTML

 



seamanrob2004
Novice


Feb 4, 2007, 3:47 PM

Post #1 of 16 (2415 views)
Matching HTML Can't Post

Can anyone explain whats going wrong here please!
When I run the script, the pattern matches are supposed to identify any form fields from a string and remove them.

It is finding the <input type tags and the rest of the junk up to >


But on one of the tags, it takes out the whole line including my </tr><tr> tags - which consequently knackers the formatting of the page.

Stumped. The html is the same throughout the site.

It takes the html from $addyprint and is supposed to remove all that:

$addyprint =~ tr/\r\t\f/ /s;
$addyprint =~ tr/ / /s;
$addyprint =~ s/\|/'/g;
$addyprint =~ s/\<a href=.*>//gi;
$addyprint =~ s/\<\/a\>//gi;
$addyprint =~ s/\<\/select\>//gi;
$addyprint =~ s/\<img.*>//gi;
$addyprint =~ s/<input.*>//gi;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


KevinR
Veteran


Feb 4, 2007, 7:48 PM

Post #2 of 16 (2412 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post

you're regexps are "greedy", which means they match as much as possible. To make them non-greedy add a "?".


Code
$addyprint =~ s/<input.*?>//gi;[/coe] 
-------------------------------------------------


ProBulletin
Novice

Feb 5, 2007, 2:31 AM

Post #3 of 16 (2405 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post

Why have you escaped some < >'s but not others?

Btw, none of them need escaping.


Regards,
Paul Wilson
ProBulletin Board: http://www.probulletin.com/


seamanrob2004
Novice


Feb 5, 2007, 8:30 AM

Post #4 of 16 (2400 views)
Re: [ProBulletin] Matching HTML [In reply to] Can't Post

I escaped some of them purely as a test with my system.
Trying to find the best way to make it happen. Just copied and pasted all the code together.
Im sure you all know what its like bug-chasing, mod, test, mod, test.

Thx for the answer Kevin - I did have a sneeky suspicion it was a little greedy when it took everything from the first image tag until the last </html> tag! btw whats the [/coe] for at the end?
My perl for dummies doesnt meantion these.....
and the other book I have is far too cumbersome to look through!

Ive also been looking at how to make my scripts more secure.
They run in a seperate .htaccess protected cgi-bin folder,
i finally implimented use strict, warnings and perl-w on *MOST* of my scripts, but one script that deals with paypal doesnt like use strict if I do it.
ive predeclared all my local variables in the first two lines of code as such:

my ($first, $second etc....);

is this an acceptable way to do it? Im of the oldschool Basic pre 1985 model and am used to using any variable i pull out of the ether so i have about 30 different ones normally per script!

Theres one variable for a 2 dimensional array (%) that I cant declare this way it doesnt like it.... how should I declare something like %thisdata as local?

I check all incoming data and remove the meta characters. The script does access and write to files, but they are randomly generated server names and write / read into flat files. Can hacking still take place? (probably a stupid question!)

I also use HTTP_REFERER to check incoming requests.

Whilst were on the subject of security what more should I be looking at?

Best regards and thanks for the help again.... now to see if it works. I have a sneeking suspicion ive tried this one already but here goes....

Rob

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


KevinR
Veteran


Feb 5, 2007, 10:56 AM

Post #5 of 16 (2395 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post


Quote
Ive also been looking at how to make my scripts more secure.


I'm not sure what "secure" you are referring to. Do you mean like authentication (name/password) or secure in a broader sense (protection from hacking/cracking/etc)?
-------------------------------------------------


seamanrob2004
Novice


Feb 5, 2007, 11:08 AM

Post #6 of 16 (2393 views)
Re: [KevinR] Matching HTML [In reply to] Can't Post

More secure against hacking attempts or attempts to gain access to the root.

Ive implimented the less greedy method and its working well. So thanks very much for that obvious yet very alluding matter!

Rob

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


KevinR
Veteran


Feb 5, 2007, 12:25 PM

Post #7 of 16 (2388 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post

using taint mode is one of the first lines of defense for perl cgi scripts. Really, all perl cgi scripts should be able to run using strict and taint mode. You run taint mode by adding the -T switch on the shebang line:

#!/usr/bin/perl -T

reading material:

http://perldoc.perl.org/perlsec.html

you might want to use a more tightly integrated authentication scheme too. If you rely on htacess only for authentication you might be vulnerable. I personally don't know much about getting around htacess protected files/directories but some people claim they can walk right through it. Using a name/password authentication that is coded into the structure of your program should be more secure and certainly more versatile. Avoid storing names/passwords as plain text files.
-------------------------------------------------


seamanrob2004
Novice


Feb 5, 2007, 2:49 PM

Post #8 of 16 (2385 views)
Re: [KevinR] Matching HTML [In reply to] Can't Post

thanks for that info, ive been searching for a more detailed explaination on the switches for perl.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


davorg
Thaumaturge / Moderator

Feb 9, 2007, 8:17 AM

Post #9 of 16 (2367 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post


In Reply To
My perl for dummies doesnt meantion these.....
and the other book I have is far too cumbersome to look through!


Perl for Dummies is a terrible book. You will be picking up all sorts of bad habits. Please throw it away and buy a copy of "Learning Perl" instead.


In Reply To
Ive also been looking at how to make my scripts more secure.
They run in a seperate .htaccess protected cgi-bin folder,
i finally implimented use strict, warnings and perl-w on *MOST* of my scripts,


-w and "use warnings" are effectively the same thing. If you have "use warnings", then you don't need -w.


In Reply To
but one script that deals with paypal doesnt like use strict if I do it.


What errors does it give you?


In Reply To
ive predeclared all my local variables in the first two lines of code as such:

my ($first, $second etc....);

is this an acceptable way to do it?


Well, it works. But it's a much better idea to declare your variables where you're using them. And to limit them to as small a scope as possible.


In Reply To
Im of the oldschool Basic pre 1985 model and am used to using any variable i pull out of the ether so i have about 30 different ones normally per script!


That's (one of the reasons) why you put "use strict" in your program. It stops you doing that.


In Reply To
Theres one variable for a 2 dimensional array (%) that I cant declare this way it doesnt like it.... how should I declare something like %thisdata as local?


That's not a two-dimension array. That's a hash. You should be able to declare that with


Code
my %thisdata;


If that doesn't work, please show us a short example, and tell us what error you are getting.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


seamanrob2004
Novice


Feb 14, 2007, 3:27 AM

Post #10 of 16 (2349 views)
Re: [davorg] Matching HTML [In reply to] Can't Post

Throw Dummies away.....?! I couldnt exist without the plain english version of events. But I also use Core Perl as reference as well.

Ive sorted the paypal script out ok now youve explained hashes. But I have another question relating to bulk checking of incoming variables.

The form I have on a webpage has about 10-15 different fields, some of which are arrays (@c = param 'c1' for example).

In order to substitute out all the special chars with blank spaces for all of them would require a long list of s/// es!

I thought about putting the incoming variables into an array something like @array = ('$c1', '$email_address', '$other_incomer'); and checking through them all in a foreach statement. Thats ok but obviously then I need to transpose the original values back to the appropriate $ tags.

Has anyone got a better solution. At this rate I may as well plod on with individual checking on each input.

Many thanks.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


davorg
Thaumaturge / Moderator

Feb 14, 2007, 3:44 AM

Post #11 of 16 (2347 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post


In Reply To
Throw Dummies away.....?! I couldnt exist without the plain english version of events. But I also use Core Perl as reference as well.


I'm serious. "Perl for Dummies" is a terrible book. It reads like it was written by someone who didn't know very much about Perl. You will learn bad habits from that book. Did I show you Mark Dominus' review of it?

"Perl for Dummies" might be in easier to understand language, but what use is that if you can't rely on what it is telling you?

"Core Perl" is better. But I really recommend that you get a copy of "Learning Perl". Or read the free online copy of "Beginning Perl".


In Reply To
But I have another question relating to bulk checking of incoming variables.

The form I have on a webpage has about 10-15 different fields, some of which are arrays (@c = param 'c1' for example).

In order to substitute out all the special chars with blank spaces for all of them would require a long list of s/// es!

I thought about putting the incoming variables into an array something like @array = ('$c1', '$email_address', '$other_incomer'); and checking through them all in a foreach statement. Thats ok but obviously then I need to transpose the original values back to the appropriate $ tags.

Has anyone got a better solution. At this rate I may as well plod on with individual checking on each input.


I'm not entirely sure that I understand what you're asking. But I think this might help.


Code
my $p1 = param('param1'); 
my $p2 = param('param2');
my @pa = param('multi_valued_param');
my $p3 = param('param2');

foreach ($p1, $p2, @pa, $p3) {
s/\W/ /g; # or whatever cleanup you want to do
}


--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


seamanrob2004
Novice


Feb 14, 2007, 4:11 AM

Post #12 of 16 (2345 views)
Re: [davorg] Matching HTML [In reply to] Can't Post

Many thanks Dave

Thats exactly what I needed! Its great this forum. Every time I visit I learn something new!

Regards

Rob

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


seamanrob2004
Novice


Feb 16, 2007, 3:11 AM

Post #13 of 16 (2328 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post

Ok, one more question!

Ive just found yet another issue I need to contend with and its one thats always plaguing me in many of the things I write.

If I want to compare two numbers to find a match i would use $first == $second right? to be true.
Well if I have 11 stored in variable $first and 3 stored in $second it returns true. How can this be? Am I comparing wrongly?

the method works fine with single digit entries if 3 == 3 is good. if 5 == 5 is good, 3 == 8 is obviously false but it works.

If I do 11 == 3 it comes back true.
I have tried comparing it as eq and using unless ne instead of if....
And I have tried padding out the single digit number with a 0, ie 03 in the above example.

It just doesnt like double digits!

Anyone come across a similar problem? Any solution. I will crack on with it for now.


[edit] sorry guys.... just been debugging. found a lt comparison instead of the < symbol. sorted it for now! many thanks anyway

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


(This post was edited by seamanrob2004 on Feb 16, 2007, 3:40 AM)


davorg
Thaumaturge / Moderator

Feb 16, 2007, 5:12 AM

Post #14 of 16 (2324 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post

Rob,

Three small points.

1/ If you're asking a new question, then it's a good idea to start a completely new thread.

2/ When asking a question, the best approach is to supply a short, but complete program that we can run, together with an explanation of the output that you expect to see.

3/ Perl has two completely different sets of logical comparison operators. ==, !=, >, >=, <, <= and <=> all treat the two operands as numbers. eq, ne, gt, ge, lt, le and cmp all treat the two operands as strings. See perldoc perlop (the sections on relational operators and equality operators) for more details.

I assume this is another area where the author of "Perl for Dummies" doesn't really know what he's talking about and therefore writes a confusing explanation :-)

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


seamanrob2004
Novice


Feb 16, 2007, 5:40 AM

Post #15 of 16 (2322 views)
Re: [davorg] Matching HTML [In reply to] Can't Post


In Reply To
2/ When asking a question, the best approach is to supply a short, but complete program that we can run, together with an explanation of the output that you expect to see.

Thats almost impossible to forfil. The script is too large to condense and in my experience, anyone who pastes more than ten lines of code gets ignored anyway!

In Reply To
3/ Perl has two completely different sets of logical comparison operators. ==, !=, >, >=, <, <= and <=> all treat the two operands as numbers. eq, ne, gt, ge, lt, le and cmp all treat the two operands as strings. See perldoc perlop (the sections on relational operators and equality operators) for more details.

See above! I did see what the problem was and corrected it.

In Reply To
I assume this is another area where the author of "Perl for Dummies" doesn't really know what he's talking about and therefore writes a confusing explanation :-)


You know, for a book that actually led me to this site (its listed in ten good resources near the end), you are very harsh on the author. If anything, it showed me this place to get more information, and in response to your point, it does mention and explain the differences between < and lt, eq and == etc etc The information you provide on here is invaluable to me and everyone else that uses this site. But I would like to share with you a saying that is with me everyday that I work at sea - assumption is the mother of all f?*! ups. Please dont assume anything is bad simply because someone writes a bad review or all the bits of the book dont match your ideas of what it should be. For Dummies like me, using it purely for referencing back to terminology its ok. Its not great I agree, but it has helped me in the past. And I do reference to Cpan, perldoc and other sites. I have taken your suggestion to change and I have now put it in the cupboard and rely on my copy of core. Im changing the error of my ways! Re:Bad Author, my limited knowledge of BASIC programming stood me in good stead with Dummies. So when the author runs off into big scripts and doesnt explain them the way others think he should, I can usually decypher and understand them. What works for one person, doesnt necessarily work for the others. And of course, I thankfully have this place to search through!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
veritas vos liberabit ~ The truth shall set you free


davorg
Thaumaturge / Moderator

Feb 16, 2007, 6:34 AM

Post #16 of 16 (2320 views)
Re: [seamanrob2004] Matching HTML [In reply to] Can't Post


In Reply To

In Reply To
2/ When asking a question, the best approach is to supply a short, but complete program that we can run, together with an explanation of the output that you expect to see.

Thats almost impossible to forfil. The script is too large to condense


Then you distil the essence of the problem into a new program.

For example, you said that you had trouble comparing variables. You said "If I want to compare two numbers to find a match i would use $first == $second right? to be true.
Well if I have 11 stored in variable $first and 3 stored in $second it returns true."

So show us that in a small, self-contained program. Something like this:


Code
$first = 11; 
$second = 3;

if ($first == $second) {
print "equal\n";
} else {
print "not equal\n";
}


and then explain what you expect to see. And if it turns out that this program doesn't demonstrate the problem that you thought you had, then the problem must be elsewhere - so you've learnt something about the problem.


In Reply To
and in my experience, anyone who pastes more than ten lines of code gets ignored anyway!


Well, everyone who helps out here is a volunteer. We get to choose which questions we answer. If you don't make your question look appealing then, yes, it's likely to be ignored.


In Reply To

In Reply To
I assume this is another area where the author of "Perl for Dummies" doesn't really know what he's talking about and therefore writes a confusing explanation :-)


You know, for a book that actually led me to this site (its listed in ten good resources near the end), you are very harsh on the author. If anything, it showed me this place to get more information,


To be honest, that's an example of the book getting it wrong. This has never been one of the best places to get help on Perl and these days there are just two or three of us who answer any questions. If "Perl for Dummies" wanted to recommend good Perl resources then it should have pointed you at Perl Monks or the Perl beginners mailing list.


In Reply To
Please dont assume anything is bad simply because someone writes a bad review or all the bits of the book dont match your ideas of what it should be. For Dummies like me, using it purely for referencing back to terminology its ok. Its not great I agree, but it has helped me in the past. And I do reference to Cpan, perldoc and other sites. I have taken your suggestion to change and I have now put it in the cupboard and rely on my copy of core. Im changing the error of my ways! Re:Bad Author, my limited knowledge of BASIC programming stood me in good stead with Dummies. So when the author runs off into big scripts and doesnt explain them the way others think he should, I can usually decypher and understand them. What works for one person, doesnt necessarily work for the others. And of course, I thankfully have this place to search through!


I'm not relying on the review. I've looked at the book in some detail and I know that it's a bad book.

I'm glad to hear that you've put it in the cupboard. Please resist the urge to get it out again. In know that it often explains things in a way that makes them easier to understand - but the underlying principles that it explains are often very wrong.

Is there a Dummies book available on a subject that you are an expert on? If so, please go into a bookshop and take a close look at it. I'm sure that you'll see there are fundamental errors in the author's understanding of the subject. Then, hopefully, you'll start to understand why I'm so keen to stop people reading this book.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives