CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Perl Mechanize - how to make a script running faster with less overhead

 



dilbert
User

Feb 19, 2012, 3:54 PM

Post #1 of 3 (925 views)
Perl Mechanize - how to make a script running faster with less overhead Can't Post

Perl Mechanize issues - how to make a script running faster with less overhead

Problem: I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that?
I could try to parse the sites either with Perl.- Mechanize would be a good thing.
Note: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension.

Prerequisites:
https://addons.mozilla.org/en-US/firefox/addon/mozrepl/
the module WWW::Mechanize::Firefox;
the module imager http://search.cpan.org/~tonyc/Imager-0.87/Imager.pm



First Approach: Here is a first Perl solution:


Code
  use WWW::Mechanize::Firefox; 
my $mech = WWW::Mechanize::Firefox->new();
$mech->get('http://google.com');

my $png = $mech->content_as_png();



Outline: This returns the given tab or the current page rendered as PNG image.
All parameters are optional. $tab defaults to the current tab. If the coordinates are given, that rectangle will be cut out. The coordinates should be a hash with the four usual entries, left,top,width,height. This is specific to WWW::Mechanize::Firefox.

As i understand from the perldoc that option with the coordinates, it is not the resize of the whole page it's just a rectangle cut out of it.... well the WWW::Mechanize::Firefox takes care for how to save screenshots.
Well i forgot to mention that i only need to have the images as small thumbnails - so we do not have to have a very very large files...i only need to grab a thumbnail screenshot of them. I have done a lookup on cpan for some module that scales down the $png and i found out Imager

The module does not concern itself with resizing images. Here we have the various image modules on CPAN, like Imager. [ http://search.cpan.org/~tonyc/Imager-0.87/Imager.pm ]
Imager - Perl extension for Generating 24 bit Images: Imager is a module for creating and altering images. It can read and write various image formats, draw primitive shapes like lines,and polygons, blend multiple images together in various ways, scale, crop, render text and more. I installed the module - but i did not have extended my basic-approach


What i have tried allready; here it is:


Code
#!/usr/bin/perl 

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(OUTPUT, ">$name");
print OUTPUT $png;
sleep (5);
}


Well this does not care about the size:

See the output commandline:


Code
linux-vi17:/home/martin/perl # perl mecha_test_1.pl 
www.google.com
www.cnn.com
www.msnbc.com
command timed-out at /usr/lib/perl5/site_perl/5.12.3/MozRepl/Client.pm line 186
linux-vi17:/home/martin/perl #



This is my source ... see the


Code
 
urls.txt
www.google.com
www.cnn.com
www.msnbc.com
news.bbc.co.uk
www.bing.com
www.yahoo.com


Question: how to extend the solution either to make sure that it does not stop in a time out. and - it does only store little thumbnails

Note:again: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension.

As a prerequisites, i allready have installed the module imager http://search.cpan.org/~tonyc/Imager-0.87/Imager.pm

love to hear from you!!

dilbert


budman
User

Feb 20, 2012, 7:09 PM

Post #2 of 3 (874 views)
Re: [dilbert] Perl Mechanize - how to make a script running faster with less overhead [In reply to] Can't Post

 
What you are looking for is a process manager to run multiple requests at a time, and then use a database backend to keep track of your results.

Parallel-ForkManager lets you design an action for each process, and limit the number of processes running at one time (you can determine and adjust as needed).

http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.9/lib/Parallel/ForkManager.pm

There are others on CPAN, search Parallel.

I've also incorporated SSH Session to spawn more processes to several servers running upto 120 processes across 30 machines.

Keeping status on each child can be tricky.
I've used:

IPC::Shared hash
limited size and functionality, need to be creative
worked, but failed child procs can lead to resource outage
not good for other people on the machine.

Semaphore file using flock
worked ok for several processes, add many processes and the fun begins.
file system write delays can lead to missed successes
causing reruns to occur.

DBI table
works very well under high demand
database can handle large amount of requests
delegates the traffic control to the db server
you can keep track of as many stats as you need

This should help you get started.


dilbert
User

Feb 20, 2012, 8:56 PM

Post #3 of 3 (870 views)
Re: [budman] Perl Mechanize - how to make a script running faster with less overhead [In reply to] Can't Post

hi budman


thx for the reply - you are just great!

BTW what about Image::Magick::Thumbnail - Produces thumbnail images with ImageMagick


well i guess that this does not run into timeouts - but i am not very sure!? What do you think - i do some investigations....


i come back and report all my findings..

greetings and many many thanks for all you did!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives