CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
extracting url from google search

 



gingins
Novice

Aug 9, 2006, 1:01 AM

Post #1 of 6 (3830 views)
extracting url from google search Can't Post

when running this program it doesn't print out the url results from the page. It's suppose to have to urls listed.
steps i took : went to google - > www.google.com and type in the word tulips to do a search . from that result i saved the google page as search.htm .


#!/usr/bin/perl
$searchresults = &readresults();
&parseresults($searchresults);

sub readresults
{
$size = (stat("/home/xxx/search.htm"))[7];
open(FH, "</home/xxx/search.htm");

read(FH, $filedata, $size);
close(FH);

return $filedata;
}

sub parseresults
{
$searchdata = $_[0];
while ($searchdata =~ /<a class=l href="(.*?)">/ig)
{
print "$1\n";
}
}


davorg
Thaumaturge / Moderator

Aug 9, 2006, 6:58 AM

Post #2 of 6 (3826 views)
Re: [gingins] extracting url from google search [In reply to] Can't Post

This kind of work is much easier if you use the right modules. I recommend LWP::Simple for downloading the HTML and HTML::LinkExtor for extracting the URLs.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


gingins
Novice

Aug 9, 2006, 12:39 PM

Post #3 of 6 (3820 views)
Re: [davorg] extracting url from google search [In reply to] Can't Post

I used the LWP UserAgent Module.. Had to because of a class assignment.
when running the program I get the following error.
Bareword found where operator expected at useragent.pl line 4, near "//www"
(Missing operator before www?)
Bareword found where operator expected at useragent.pl line 4, near "q=tulips&btnG=Google"
syntax error at useragent.pl line 4, near "http:"
Execution of useragent.pl aborted due to compilation errors.

#!/usr/bin/perl
use LWP::UserAgent;
my($connection) = LWP::UserAgent->new;
my($response) = $connection->get(http://www.google.com/search?hl=en&q=tulips&btnG=Google+Search);
if ($response->is_success)
{
my($results) = $response->content;
print $results;
}
else
{
print "Connection error\n";
}


#!/usr/bin/perl
use LWP::UserAgent;
my($connection) = LWP::UserAgent->new;
my($response) = $connection->get(http://www.google.com/search?hl=en&q=tulips&btnG=Google+Search);
if ($response->is_success)
{
my($results) = $response->content;
print $results;
}
else
{
print "Connection error\n";
}
No idea what means

In Reply To


KevinR
Veteran


Aug 9, 2006, 1:51 PM

Post #4 of 6 (3817 views)
Re: [gingins] extracting url from google search [In reply to] Can't Post

needs quoting:


Code
my($response) = $connection->get('http://www.google.com/search?hl=en&q=tulips&btnG=Google+Search');


since there are no variables or meta characters in the string you use single-quotes, if there were variables or meta characters you would use double-quotes.
-------------------------------------------------


gingins
Novice

Aug 9, 2006, 2:14 PM

Post #5 of 6 (3814 views)
Re: [KevinR] extracting url from google search [In reply to] Can't Post

Thanks that worked.


davorg
Thaumaturge / Moderator

Aug 10, 2006, 5:17 AM

Post #6 of 6 (3810 views)
Re: [gingins] extracting url from google search [In reply to] Can't Post

A "bareword" is an unquoted string that Perl doesn't understand (i.e. it's not a function name or something like that). If you're not using "use strict" (and you really should be!) then Perl will treat it as a quoted string.

Your problem is that you have strings that don't have quotes round them (in particular your URLs). As KevinR pointed out, you need to quote them.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives