CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner: Re: [dilbert] a little script that makes use of LWP::Simple: Edit Log


Jan 30, 2018, 7:29 PM

Views: 7849
Re: [dilbert] a little script that makes use of LWP::Simple


Hardcoding the total number of pages isn't practical as it could vary. You could:

- extract the number of results from the first page, divide that by the results per page ( 21 ) and round it down.
- extract the url from the "last" link at the bottom of the page, create a URI object and read the page number from the query string.

Note that I say round down above, because the query page number begins at 0, not 1.

Looping over could be as simple as:

my $url_pattern = ''; 

for my $page ( 0 .. $last )
my $url = sprintf $url_pattern, $page;


Otherwise, I personally would probably try to incorporate paging into the $conf, perhaps an iterator which upon each call fetches the next node, behind the scenes it automatically increments the page when there are no nodes left until there are no pages left. But this is probably beyond the scope of what you need and a basic looping mechanism should be sufficient.

I checked Web::Scraper in case it had features to handle paging, which it unfortunately doesn't. It is however a much more featuresome replacement to my solution above, it could be used in place if you preferred.

Finally, if you eventually need to look into distributing the scrape process across multiple processes, there are various ways this could be incorporated, but you should consider doing it asynchronously via HTTP::Async.

Let us know how you get on.


(This post was edited by Zhris on Jan 30, 2018, 7:36 PM)

Edit Log:
Post edited by Zhris (Enthusiast) on Jan 30, 2018, 7:31 PM
Post edited by Zhris (Enthusiast) on Jan 30, 2018, 7:35 PM
Post edited by Zhris (Enthusiast) on Jan 30, 2018, 7:36 PM

Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives