CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I need a program that...:
spider

 



benlawrie
Deleted

Sep 9, 2000, 2:23 PM

Post #1 of 5 (1004 views)
spider Can't Post

does anyone know of a spider script that can be pointed at a url to see if it contains a certain bit of text within the page?


Kanji
User / Moderator

Sep 11, 2000, 6:51 PM

Post #2 of 5 (1004 views)
Re: spider [In reply to] Can't Post

Not a spider, but it will check one page for a piece of text ...

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

#!/usr/bin/perl -wl

use strict;
use LWP::Simple qw( get );

my $url = shift || die "No URL given.\n";
my $text = shift || die "No text given\n";

if ( my $page = get($url) ) {
print $page =~ /\b\Q$text\E\b/i
? "YES"
: "NO";
} else {
print "Could not get $url!";
}</pre><HR></BLOCKQUOTE>

[This message has been edited by Kanji (edited 09-12-2000).]


benlawrie
Deleted

Sep 11, 2000, 7:00 PM

Post #3 of 5 (1004 views)
Re: spider [In reply to] Can't Post

i am i don't know much about perl scripting. could you tell me how to use this and posibly how it works so i can try and make my own?


Kanji
User / Moderator

Sep 11, 2000, 10:04 PM

Post #4 of 5 (1004 views)
Re: spider [In reply to] Can't Post

Save it to a file (ie, "contains.pl"), make sure the file is executable (ie, "chmod 700", perhaps 755) if your OS warrants it, and then run it from your favourite command line shell as ...

contains.pl http://www.perlguru.com/ kanji

... where http://www.perlguru.com/ is the URL you want to look at, and kanji is the text to look for.

Depending on your setup, you may also need to prepend that with "perl" or "/path/to/perl".

How this works is by including one of the WWW libraries for perl (LWP = LibW(rary)WW-Perl), and using it's &get() subroutine to save the content of the target page to a variable.

We then search that variable to see if it contains the wanted text and report accordingly.

If you're confused by the print expr ? a : b construct, it's fucntionally the same as ...

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

if ( expr ) { print a }
else { print b }</pre><HR></BLOCKQUOTE>

Finally, the regexp gets broken down as ...

\b = search for a word boundry (so that if you entered the word "all" it won't match "small".

\Q = disable search metacharacters which have special meanings inside regular expressions so that if you wanted the word ".(" it would work without blowing up your script by making it look for the matching ")". See the perlre documentation page.

\E = re-enables the use of search metacharacters.

\b = another search boundry, so that the word "old" won't match "goldden".

Finally, /i make the search case-insensitive.

Two other things: there was a typo in the code which I'll fix after this ( /b should have been /\b ), and UBB did its usuall trick of inserting a space inbetween the &#0124; &#0124;'s so it may not have run beforehand.


robo
Deleted

Sep 29, 2000, 7:38 AM

Post #5 of 5 (1004 views)
Re: spider [In reply to] Can't Post

 http://www.perlarchive.com/guide/Remote_Content_Integration/

here it seems, you will find everything you need :-)

robo

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives