
tules
Novice
Sep 1, 2008, 7:27 PM
Post #1 of 4
(1311 views)
|
|
Robot: storing URLs in multidimensional hash
|
Can't Post
|
|
You should be able to see what im trying to do here, i want to generate anonymous hash references for each key, which is a url; each anonymous hash will then contain a list of keys for all the urls extracted from the page, each of which then opens up into another anonymous hash for the links on that page, so on add infinitum. The result should be a tree like structure mapping out all the urls crawled by the bot. Please help, this is urgent as I need to impress my new boss :) use warnings; use Sort::Array qw(Discard_Duplicates ); use LWP::Simple; use HTML::LinkExtor; my %hash = ("http://www.myspace.com" => {} ); my $p = HTML::LinkExtor->new(); my @collected_stuff; while ((my $key,my $value) = each(%hash)) {my $content = get($key); $p->parse($content)->eof; my @links = $p->links; my @links1 = (); foreach my $link (@links) {if ($$link[2] =~ m/\.(js|css|png|wav|mp3|rm|mpg|bmp|jpg|rar|tar|zip|tif|gif|mp4)$/g) {next;} push (@links1, $$link[2]); } my @links2 = (); foreach my $link1 (@links1) {$link1=~ s/\/$//g; if ($link1 =~ /^(http:)/g) {push (@links2, $link1);} } @links2 = Discard_Duplicates ( empty_fields => 'delete', data => \@links2, ); foreach my $link2 (@links2) {print "$hash{$key}\n$hash{$key}{$link2}\n$link2\n";} };
$hash{$key} this prints out HASH(0x239b94) $hash{$key}{$link2} this prints out nothing at all, and seems to throw the error "use of uninitialised value" So what's the deal? Do i need to use references? And if so how?
|