Home: Perl Programming Help: Intermediate:
mapping a website

New User

Nov 14, 2011, 2:40 PM

Views: 3698
mapping a website

Hello perl people^^

So i'm kinda lost with this. I need to write a script that will "map" a website.. and just output all the links that have parameters.. so for example i want to map out a site called whatever.com the output would be like:


..yea, so on.. and i know this is no easy task with my not so good perl skills Shocked i might even end up posting it as a project on freelancers...

thx for any help


Nov 15, 2011, 4:00 AM

Views: 3629
Re: [slekness] mapping a website

i think you will need to use the LWP module along with HTML Module to get all the links and filter threw them


Feb 9, 2012, 12:02 PM

Views: 3235
Re: [wickedxter] mapping a website

Start with this.

# Butchered up from:
# http://search.cpan.org/~gaas/HTML-Parser-3.69/lib/HTML/LinkExtor.pm

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my @params = ();
my $url = "http://perlguru.com/gforum.cgi?post=59493;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed;;page=unread#unread"; # for instance

$ua = LWP::UserAgent->new();
# Make the parser. Unfortunately, we don't know the base yet
# (it might be different from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});
# Expand all image URLs to absolute ones
my $base = $res->base;
@params = map { $_ = url($_, $base)->abs; } @params;
# Print them out
print join("\n", @params), "\n";

# Set up a callback that collect links with parameters
sub callback {
my($tag, %links) = @_;
return if $tag ne "a";
foreach $elm (keys(%links)){
push(@params,$links{$elm}) if ( $elm eq "href" && $links{$elm} =~ /\?.*=/);