CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Hacking the mozrepl-Timeout-configuration: avoiding the stop of a Mechanize-Job

 



dilbert
User

Feb 20, 2012, 9:53 AM

Post #1 of 1 (662 views)
Hacking the mozrepl-Timeout-configuration: avoiding the stop of a Mechanize-Job Can't Post

 
i have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl.- Mechanize would be a good thing. Note: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension. At the moment i have a solution which is slow and does not give back thumbnails: How to make the script running faster with less overhead - spiting out the thumbnails

Prerequisites:addon/mozrepl/ the module WWW::Mechanize::Firefox; the module imager

First Approach: Here is a first Perl solution:

use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
$mech->get('http://google.com');
my $png = $mech->content_as_png();
What i have tried allready; here it is:

#!/usr/bin/perl

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(OUTPUT, ">$name");
print OUTPUT $png;
sleep (5);
}

Well this does not care about the size:

See the output commandline:

inux-vi17:/home/martin/perl # perl mecha_test_1.pl
www.google.com
www.cnn.com
www.msnbc.com
command timed-out at /usr/lib/perl5/site_perl/5.12.3/MozRepl/Client.pm line 186
linux-vi17:/home/martin/perl #

This is my source ... see a snippet [example]of the sites i have in the url-list.

urls.txt [the list of sources ]

www.google.com
www.cnn.com
www.msnbc.com
news.bbc.co.uk
www.bing.com
www.yahoo.com

Question: how to extend the solution either to make sure that it does not stop in a time out. and - it does only store little thumbnails Note:again: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension. As a prerequisites, i allready have installed the module imager

How to make the script running faster with less overhead - spiting out the thumbnails

Love to hear from you! greetings zero

Update: Is there a way to specify Net::Telnet timeout with WW::Mechanize::Firefox? At the moment my internet connection is very slow and sometimes I get error with

$mech->get(): command timed-out at /usr/local/share/perl/5.10.1/MozRepl/Client.pm line 186


Perhaps i have to look after the mozrepl-Timeout-configuration!? But after all: This is weird and I don't know where that timeout comes from. Maybe it really is Firefox
timing out as it is busy synchronously fetching some result.

If it really is Net::Telnet, then you'll have to dive down:

$mech->repl->repl->client->{telnet}->timeout($new_timeout);

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives