CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
mapping a website

 



slekness
New User

Nov 14, 2011, 2:40 PM

Post #1 of 3 (3048 views)
mapping a website Can't Post

Hello perl people^^

So i'm kinda lost with this. I need to write a script that will "map" a website.. and just output all the links that have parameters.. so for example i want to map out a site called whatever.com the output would be like:

www.whatever.com/page.php?id=1
www.whatever.com/search.php?person=a
www.whatever.com/video.php?id=12

..yea, so on.. and i know this is no easy task with my not so good perl skills Shocked i might even end up posting it as a project on freelancers...

thx for any help


wickedxter
User

Nov 15, 2011, 4:00 AM

Post #2 of 3 (2979 views)
Re: [slekness] mapping a website [In reply to] Can't Post

i think you will need to use the LWP module along with HTML Module to get all the links and filter threw them


histrung
Novice

Feb 9, 2012, 12:02 PM

Post #3 of 3 (2585 views)
Re: [wickedxter] mapping a website [In reply to] Can't Post

Start with this.

Code
#!/usr/bin/perl 
# Butchered up from:
# http://search.cpan.org/~gaas/HTML-Parser-3.69/lib/HTML/LinkExtor.pm

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my @params = ();
my $url = "http://perlguru.com/gforum.cgi?post=59493;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed;;page=unread#unread"; # for instance

$ua = LWP::UserAgent->new();
# Make the parser. Unfortunately, we don't know the base yet
# (it might be different from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});
# Expand all image URLs to absolute ones
my $base = $res->base;
@params = map { $_ = url($_, $base)->abs; } @params;
# Print them out
print join("\n", @params), "\n";

# Set up a callback that collect links with parameters
sub callback {
my($tag, %links) = @_;
return if $tag ne "a";
foreach $elm (keys(%links)){
push(@params,$links{$elm}) if ( $elm eq "href" && $links{$elm} =~ /\?.*=/);
}
}


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives