CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Intermediate:
mapping a website


New User

Nov 14, 2011, 2:40 PM

Post #1 of 3 (3697 views)
mapping a website Can't Post

Hello perl people^^

So i'm kinda lost with this. I need to write a script that will "map" a website.. and just output all the links that have parameters.. so for example i want to map out a site called the output would be like:

..yea, so on.. and i know this is no easy task with my not so good perl skills Shocked i might even end up posting it as a project on freelancers...

thx for any help


Nov 15, 2011, 4:00 AM

Post #2 of 3 (3628 views)
Re: [slekness] mapping a website [In reply to] Can't Post

i think you will need to use the LWP module along with HTML Module to get all the links and filter threw them


Feb 9, 2012, 12:02 PM

Post #3 of 3 (3234 views)
Re: [wickedxter] mapping a website [In reply to] Can't Post

Start with this.

# Butchered up from:

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my @params = ();
my $url = ";sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed;;page=unread#unread"; # for instance

$ua = LWP::UserAgent->new();
# Make the parser. Unfortunately, we don't know the base yet
# (it might be different from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});
# Expand all image URLs to absolute ones
my $base = $res->base;
@params = map { $_ = url($_, $base)->abs; } @params;
# Print them out
print join("\n", @params), "\n";

# Set up a callback that collect links with parameters
sub callback {
my($tag, %links) = @_;
return if $tag ne "a";
foreach $elm (keys(%links)){
push(@params,$links{$elm}) if ( $elm eq "href" && $links{$elm} =~ /\?.*=/);


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives