CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Need a Custom or Prewritten Perl Program?: I need a program that...:



Sep 9, 2000, 2:23 PM

Post #1 of 5 (2398 views)
spider Can't Post

does anyone know of a spider script that can be pointed at a url to see if it contains a certain bit of text within the page?

User / Moderator

Sep 11, 2000, 6:51 PM

Post #2 of 5 (2398 views)
Re: spider [In reply to] Can't Post

Not a spider, but it will check one page for a piece of text ...

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

#!/usr/bin/perl -wl

use strict;
use LWP::Simple qw( get );

my $url = shift || die "No URL given.\n";
my $text = shift || die "No text given\n";

if ( my $page = get($url) ) {
print $page =~ /\b\Q$text\E\b/i
? "YES"
: "NO";
} else {
print "Could not get $url!";

[This message has been edited by Kanji (edited 09-12-2000).]


Sep 11, 2000, 7:00 PM

Post #3 of 5 (2398 views)
Re: spider [In reply to] Can't Post

i am i don't know much about perl scripting. could you tell me how to use this and posibly how it works so i can try and make my own?

User / Moderator

Sep 11, 2000, 10:04 PM

Post #4 of 5 (2398 views)
Re: spider [In reply to] Can't Post

Save it to a file (ie, ""), make sure the file is executable (ie, "chmod 700", perhaps 755) if your OS warrants it, and then run it from your favourite command line shell as ... kanji

... where is the URL you want to look at, and kanji is the text to look for.

Depending on your setup, you may also need to prepend that with "perl" or "/path/to/perl".

How this works is by including one of the WWW libraries for perl (LWP = LibW(rary)WW-Perl), and using it's &get() subroutine to save the content of the target page to a variable.

We then search that variable to see if it contains the wanted text and report accordingly.

If you're confused by the print expr ? a : b construct, it's fucntionally the same as ...

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

if ( expr ) { print a }
else { print b }</pre><HR></BLOCKQUOTE>

Finally, the regexp gets broken down as ...

\b = search for a word boundry (so that if you entered the word "all" it won't match "small".

\Q = disable search metacharacters which have special meanings inside regular expressions so that if you wanted the word ".(" it would work without blowing up your script by making it look for the matching ")". See the perlre documentation page.

\E = re-enables the use of search metacharacters.

\b = another search boundry, so that the word "old" won't match "goldden".

Finally, /i make the search case-insensitive.

Two other things: there was a typo in the code which I'll fix after this ( /b should have been /\b ), and UBB did its usuall trick of inserting a space inbetween the &#0124; &#0124;'s so it may not have run beforehand.


Sep 29, 2000, 7:38 AM

Post #5 of 5 (2398 views)
Re: spider [In reply to] Can't Post

here it seems, you will find everything you need :-)



Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives