CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Global symbol errors - all along the way... need help with a little script

 



dilbert
User

Nov 9, 2017, 11:27 AM

Post #1 of 11 (2728 views)
Global symbol errors - all along the way... need help with a little script Can't Post

hello dear perl-experts,


I'm pretty new to Programming and OO programming especially.
Nonetheless, I'm trying to get done a very simple Spider for web crawling.

Here's what i do not get to work



Code
 
#!C:\Perl\bin\perl

use strict; # You always want to include both strict and warnings
use warnings;

use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;

# There was no reason for this to be in a BEGIN block (and there
# are a few good reasons for it not to be)
open my $file1,"+>>", ("links.txt");
select($file1);

#The Url I want it to start at;
# Note that I've made this an array, @urls, rather than a scalar, $URL
my @urls = ('http://www.computersecrets.eu.pn/');

# I'm not positive, but this should only need to be set up once, not
# on every pass through the loop
my $browser = LWP::UserAgent->new('IE 6');
$browser->timeout(10);

#Request and receive contents of a web page;
# Need to use a while loop instead of a for loop because @urls will
# be changing as we go
while (@urls) {
my $url = shift @urls;
my $request = HTTP::Request->new(GET => $URL);
my $response = $browser->request($request);

#Tell me if there is an error;
if ($response->is_error()) {printf "%s\n", $response->status_line;}
my $contents = $response->content();

#Extract the links from the HTML;
my ($page_parser) = HTML::LinkExtor->new(undef, $url);
$page_parser->parse($contents)->eof;
@links = $page_parser->links;

#Print the link to a links.txt file;
foreach $link (@links) {
push @urls, $$link[2]; # Add link to list of urls before printing it
print "$$link[2]\n";
}

sleep 60;
}


well i get lots of errors.


Code
Global symbol "$URL" requires explicit package name at wc1.pl line 32. 
Global symbol "@links" requires explicit package name at wc1.pl line 42.
Global symbol "$link" requires explicit package name at wc1.pl line 45.
Global symbol "@links" requires explicit package name at wc1.pl line 45.
Global symbol "$link" requires explicit package name at wc1.pl line 46.
Global symbol "$link" requires explicit package name at wc1.pl line 47.
Execution of wc1.pl aborted due to compilation errors.
martin@linux-jnmx:~/perl> ^C
martin@linux-jnmx:~/perl>



any idea


dilbert
User

Nov 9, 2017, 12:54 PM

Post #2 of 11 (2719 views)
Re: [dilbert] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

hello you guys

fixed this



Code
#!C:\Perl\bin\perl 

use strict; # You always want to include both strict and warnings
use warnings;

use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;

# There was no reason for this to be in a BEGIN block (and there
# are a few good reasons for it not to be)
open my $file1,"+>>", ("links.txt");
select($file1);

#The Url I want it to start at;
# Note that I've made this an array, @urls, rather than a scalar, $URL
my @urls = ('https://the url goes in here');
my %visited; # The % sigil indicates it's a hash
my $browser = LWP::UserAgent->new();
$browser->timeout(5);

while (@urls) {
my $url = shift @urls;

# Skip this URL and go on to the next one if we've
# seen it before
next if $visited{$url};

my $request = HTTP::Request->new(GET => $url);
my $response = $browser->request($request);

# No real need to invoke printf if we're not doing
# any formatting
if ($response->is_error()) {print $response->status_line, "\n";}
my $contents = $response->content();

# Now that we've got the url's content, mark it as
# visited
$visited{$url} = 1;

my ($page_parser) = HTML::LinkExtor->new(undef, $url);
$page_parser->parse($contents)->eof;
my @links = $page_parser->links;

foreach my $link (@links) {
print "$$link[2]\n";
push @urls, $$link[2];
}
sleep 60;
}



Laurent_R
Veteran / Moderator

Nov 9, 2017, 11:18 PM

Post #3 of 11 (2710 views)
Re: [dilbert] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

Hi Dilbert,
I had seen a couple of errors in your original script that I was going to report (e.g. $url versus $URL, etc.), but you've fixed them in the new post.

Do you still get errors? Please report in which way your new script fails to do what you want.


dilbert
User

Nov 10, 2017, 12:48 AM

Post #4 of 11 (2707 views)
Re: [Laurent_R] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

hello dear Laurent,


many thanks for the quick reply.

i want to modify the script a bit - tailoring and tinkering is the way to learn. I want to fetch urls with a certain content in the URL-string



Code
"http://www.foo.com/bar"



in other words: what is aimed, i need to fetch all the urls that contains the term " /bar"
- then i want to extract the "bar" so that it remains the url: http://www.foo.com
-


is this doable?

love to hear from you

greetings to Paris,


Chris Charley
User

Nov 10, 2017, 8:29 AM

Post #5 of 11 (2696 views)
Re: [dilbert] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

Hello,

Just one problem I see in your fixed script.

the line: next if $visited{$url};

really should be: next if $visited{$url}++;

The first time you see a particular url, it will be undefined (false) so the next won't apply. But the next time you find the same url, it will be defined (value == 1 == true) so the next will execute.

The '++' is a post increment so if the first time a url is found it will test then increment - that's how this works.


(This post was edited by Chris Charley on Nov 10, 2017, 8:31 AM)


dilbert
User

Nov 10, 2017, 9:36 AM

Post #6 of 11 (2691 views)
Re: [Chris Charley] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

hello dear Chris,


many thanks or the reply great to hear from you.

well - great; i want to fetch

- all urls that contain a certain set of characters: "bar"


Code
"http://www.foo.com/bar"


in fact: i want to fetch all urls with the certain set of characters



Code
"http://www.foo.com/bar"



so the above mentioned code in the threadstart should search all the links with


Code
bar  


so we have to rewrite this a bit...


In Reply To
Hello,

Just one problem I see in your fixed script.

the line: next if $visited{$url};

really should be: next if $visited{$url}++;

The first time you see a particular url, it will be undefined (false) so the next won't apply. But the next time you find the same url, it will be defined (value == 1 == true) so the next will execute.

The '++' is a post increment so if the first time a url is found it will test then increment - that's how this works.



Chris Charley
User

Nov 10, 2017, 11:50 AM

Post #7 of 11 (2687 views)
Re: [dilbert] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

Oops. I now found in your code where the url is set as seen - $visited{$url} = 1;

Ignore my previous post!


(This post was edited by Chris Charley on Nov 10, 2017, 11:51 AM)


dilbert
User

Nov 10, 2017, 12:35 PM

Post #8 of 11 (2681 views)
Re: [Chris Charley] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

hello dear chris,


many thanks for the quick answer - great to hear from you.

i need to fetch all the urls that contain this term;

/participants-database/


i take this term - and add this into the l
ine 20:


my @urls = ('here i have to paste the url ....');

or the line 26:

my $url = shift @urls;

in fact - i want to have all the urls that contain this term...

the question is: how to tailor & tweak the script...!?


dilbert
User

Nov 11, 2017, 11:33 AM

Post #9 of 11 (2669 views)
Re: [dilbert] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

hello dear all

i tried the following thing out...



Code
my $url =~s|/bar$||;


but i left out the "my", The "my" causes a new $url to be created.
What we want is to modify the old $url.


what is aimed: i want to do a search to find out all urls that contains the following term: /participants-database/

but unfortunatley this does not work :


Code
  
#!C:\Perl\bin\perl

use strict; # You always want to include both strict and warnings
use warnings;


use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;

# There was no reason for this to be in a BEGIN block (and there
# are a few good reasons for it not to be)
open my $file1,"+>>", ("links.txt");
select($file1);

#The Url I want it to start at;
# Note that I've made this an array, @urls, rather than a scalar, $URL
#my @urls = (' $url =~ s$||;');
my $urls =~ ('s|/participants-database$||');
my %visited; # The % sigil indicates it's a hash
my $browser = LWP::UserAgent->new();
$browser->timeout(5);

while (@urls) {
my $url = shift @urls;
# Skip this URL and go on to the next one if we've
# seen it before
next if $visited{$url};

my $request = HTTP::Request->new(GET => $url);
my $response = $browser->request($request);

# No real need to invoke printf if we're not doing
# any formatting
if ($response->is_error()) {print $response->status_line, "\n";}
my $contents = $response->content();

# Now that we've got the url's content, mark it as
# visited
$visited{$url} = 1;

my ($page_parser) = HTML::LinkExtor->new(undef, $url);
$page_parser->parse($contents)->eof;
my @links = $page_parser->links;

foreach my $link (@links) {
print "$$link[2]\n";
push @urls, $$link[2];
}
sleep 60;
}



FishMonger
Veteran / Moderator

Nov 11, 2017, 2:35 PM

Post #10 of 11 (2665 views)
Re: [dilbert] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

You've posted 185 questions since Sept 2010 and often cross posted them on other sites and in a large number of those you were told how to fix Global symbol errors so you should be able to fix that part on your own.

Please post a short sample "links.txt" file so we can run some tests.

I see several problems in your code but before I comment on them I want to run a few tests using your links file.

Part of the solution will be to add another module. Most likely either URI or URI::Split.


dilbert
User

Nov 12, 2017, 10:50 AM

Post #11 of 11 (2653 views)
Re: [FishMonger] Global symbol errors - all along the way... need help with a little script [In reply to] Can't Post

hello dear Fishmonger,



many many thanks for the reply - great to hear from you. Youre right.

I tried to make some efforts in php and perl - for some tasks perl is the language of choice....

here below i have the code that works - and that is the base of some further changes: the new tasks: well what i want to do now is to change is the following; i want to modify the script a bit - tailoring and tinkering is the way to learn. I want to fetch urls with a certain content in the URL-string ....

in other words: what is aimed:

- i need to fetch all the urls that contains the term " /bar " . in other words:
- after fetching the urls i want to extract the "bar" so that it remains the url of the whole construct: http://www.xy.com/participants-database/


but first of all - here the code that works - the base of my weekend-project:


Code
 
#!C:\Perl\bin\perl

use strict; # You always want to include both strict and warnings
use warnings;

use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;

# There was no reason for this to be in a BEGIN block (and there
# are a few good reasons for it not to be)
open my $file1,"+>>", ("links.txt");
select($file1);

#The Url I want it to start at;
# Note that I've made this an array, @urls, rather than a scalar, $URL
my @urls = ('http://www.cems.org/academic-members/our-members/list/');
my %visited; # The % sigil indicates it's a hash
my $browser = LWP::UserAgent->new();
$browser->timeout(5);

while (@urls) {
my $url = shift @urls;

# Skip this URL and go on to the next one if we've
# seen it before
next if $visited{$url};

my $request = HTTP::Request->new(GET => $url);
my $response = $browser->request($request);

# No real need to invoke printf if we're not doing
# any formatting
if ($response->is_error()) {print $response->status_line, "\n";}
my $contents = $response->content();

# Now that we've got the url's content, mark it as
# visited
$visited{$url} = 1;

my ($page_parser) = HTML::LinkExtor->new(undef, $url);
$page_parser->parse($contents)->eof;
my @links = $page_parser->links;

foreach my $link (@links) {
print "$$link[2]\n";
push @urls, $$link[2];
}
sleep 60;
}



the results: i got back more than 200 lines - see below the output sample:


the new tasks: well what i want to do now is to change is the following; i want to modify the script a bit - tailoring and tinkering is the way to learn. I want to fetch urls with a certain content in the URL-string ....

in other words: what is aimed:

- i need to fetch all the urls that contains the term " /bar " . in other words:
- after fetching the urls i want to extract the "bar" so that it remains the url of the whole construct: http://www.xy.com/participants-database/

given the following results:



Code
 
http://www.1.com/participants-database/
http://www.2.com/participants-database/
http://www.3.com/participants-database/
http://www.4.com/participants-database/
http://www.5.com/participants-database/
http://www.6.com/participants-database/
http://www.7.com/participants-database/



first of all i have to fetch the urls:
then i have to strip the


Code
 "PORT: ", $uri->port, "\n"; 
"PATH: ", $uri->path, "\n"



so that i have the following results:


Code
http://www.1.com/ 
http://www.2.com/
http://www.3.com/
http://www.4.com/
http://www.5.com/
http://www.6.com/
http://www.7.com/



well: ...to achieve this i need to tailor the script a bit. And yes: i think that i need split, why: if i have the results - i guess 200 or more lines - then i want to extract parts of the URLs using regular expressions. i have to strip the url

- URL scheme://domain:port..../participants-database/ ... that i can get the domain ,,,,,

so that i can get the urls alone...:

Well - That can be done with Perl like so:

given the general format of a URL is scheme://domain:port/path?query_string#fragment_id

While domain (and possible other parts of the URL) may contain Unicode characters, in the following we assume that only ASCII characters are used. Furthermore, we assume that



Code
 
scheme only consists of letters az and AZ;
domain does not contain :, ?, # or /;
port is a natural number, :port is optional;
path does not contain ? or #, path is optional;
query_string does not contain #, ?query_string is optional;
fragment_id can contain arbitrary characters, #fragment_id is optional.

Here is my code:

@urls = (
"http://www.example.com/",
"http://www80.local.com:80/",
"https://www.ex221.ac.uk:442/perl/rulez?all+q#all.time");

foreach (@urls) {
print "URL: $_\n";
($scheme,$domain,$port,$path,$query,$fragment) = (/(.)(.)(.)(.)(.)(.)/);
print "SCHEME: $scheme, DOMAIN: $domain, PORT: $port\n";
print "PATH: $path\n"; print "QUERY: $query\n";
print "FRAGMENT: $fragment\n\n";
}



.... well to achive that i can use the URI module:



Code
 
use URI;

my @urls = (
"http://www.example.com/",
"http://www80.local.com:80/",
"https://www.ex221.ac.uk:442/perl/rulez?all+q#all.time");

foreach (@urls) {
my $uri = URI->new($_);
print "URL: $_\n";
print "SCHEME: ", $uri->scheme, "\n";
print "DOMAIN: ", $uri->host, "\n";
print "PORT: ", $uri->port, "\n";
print "PATH: ", $uri->path, "\n";
print "QUERY: ", $uri->query, "\n";
print "FRAGMENT: ", $uri->fragment, "\n";
}




Back to the code that works allready. (see above:)

remembering: the basic was: http://www.cems.org/academic-members/our-members/list/

see the results:



Code
http://www.cems.org/sites/all/themes/cems_theme/favicon.ico 
http://www.cems.org/rss/news.xml
http://www.cems.org/sites/default/files/css/css_fbccd6cf1d744a02e3d3c96b13899abc.css
http://www.cems.org/sites/default/files/css/css_cb1d8f9de90605e479255100ae34fad0.css
http://www.cems.org/sites/default/files/js/js_dcc6ca7e3b31340a2b20a3293ea00940.js
https://plus.google.com/112980751747702528942
http://www.cems.org/
http://www.cems.org/sites/all/themes/cems_theme/images/custom/cems-logo.png
http://www.cems.org/
http://www.cems.org/about/contacts
http://www.cems.org/lostpassword
https://cas.cems.org:443/cas/login?service=http://www.cems.org/cas&locale=en
http://www.cems.org/academic-members/our-members/list/
http://www.cems.org/about
http://www.cems.org/about/overview
http://www.cems.org/about/mission
http://www.cems.org/about-cems/overview/key-facts-figures
http://www.cems.org/about/alumni-profiles
http://www.cems.org/about/global
http://www.cems.org/sustainability
http://www.cems.org/sustainability/strategy
http://www.cems.org/sustainability/implementation
http://www.cems.org/sustainability/projects-profiles
http://www.cems.org/about/history
http://www.cems.org/about/organisation
http://www.cems.org/about/organisation/boards
http://www.cems.org/about/organisation/headoffice
http://www.cems.org/about/organisation/committees
http://www.cems.org/academic-members/faculty-groups
http://www.cems.org/about/organisation/student
http://www.cems.org/about/organisation/alumni
http://www.cems.org/about/contacts
http://www.cems.org/about-cems/contacts/head-office
http://www.cems.org/about/contacts/programme-managers
http://www.cems.org/students/student-life/student-board/members
http://www.cems.org/about/contacts/cems-club


well ,,,, i try to get ahead...; now i want to

love to hear from you


(This post was edited by dilbert on Nov 12, 2017, 11:12 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives