
wickedxter
User
Oct 3, 2012, 1:43 PM
Views: 1260
|
|
Re: [dilbert] Perl::Mechanize - how to loop within this [example]
|
|
|
This is how i would process the pages in the url where page3000 thru page 3004 existed.. if your processing HTML on the page use HTML::TokeParser or HTML::TokeParser::Simple
#!/usr/bin/perl ## This is how i would go about doing what i understand about what your trying todo ## EXAMPLE only use 5.014; use strict; use warnings; use WWW::Mechanize; my $target_url = 'http://www.google.com/'; my $page = 3000; my $format = '.html'; my $max_page_num = 4; #loop threw the pages for (0..$max_page_num){ my $mech = WWW::Mechanize->new(); $mech->agent_alias('Microsoft Mozilla'); #this combines to make the url my $url = $target_url . 'page'. "$page" . "$format"; #get the page $mech->get($url); #get all links that match the regex my @links = $mech->find_all_links(url_regex => qr//); ###process the links and follow_link or process page. #this makes sure the pages are processed in order $page++; } 1;
(This post was edited by wickedxter on Oct 3, 2012, 1:46 PM)
|