CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
storing the output of a www::mechanize script

 



dilbert
User

May 23, 2013, 9:47 AM

Post #1 of 15 (924 views)
storing the output of a www::mechanize script Can't Post

 
hello i ve a little script that fetches (or lets say at least it should do that) fetches images or screenshots from webpages and stores them.

but i do not know where it stores them...

i want to store the images in a folder. is this doable?




Code
#!/usr/bin/perl 

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(OUTPUT, ">$name");
print OUTPUT $png;
sleep (5);
}



see the output


Code
 


http://www.google.com
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 4.
http://www.yahoo.com
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 5.
http://www.cnn.com
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 6.
http://www.bing.com
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 7.
http://www.nbcnews.co
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 8.
http://www.msnbc.com
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 9.
http://news.bbc.co.uk
print() on closed filehandle OUTPUT at mech10.pl line 20, <INPUT> line 10.
martin@linux-70ce:~/perl>



FishMonger
Veteran / Moderator

May 23, 2013, 10:09 AM

Post #2 of 15 (920 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

Your open call in the while loop failed, but you implicitly told perl not to tell that it failed. Add proper error handing on the open call to find out why it failed.

Also, you should be using a lexical vars for filehandles instead of the barewords. You should also be using the 3 arg form of open and the die statement should include the filename.


dilbert
User

May 25, 2013, 4:54 AM

Post #3 of 15 (905 views)
Re: [FishMonger] storing the output of a www::mechanize script [In reply to] Can't Post

hello dear FishMonger

many thanks for the reply - great to hear from you!



In Reply To
Your open call in the while loop failed, but you implicitly told perl not to tell that it failed. Add proper error handing on the open call to find out why it failed.

Also, you should be using a lexical vars for filehandles instead of the barewords. You should also be using the 3 arg form of open and the die statement should include the filename.




update;

hmm in the folder images which is located here




Code
$path = "~/perl/images";





Code
#!/usr/bin/perl 

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

my $path = "~/perl/images";

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(OUTPUT, ">$path/$name");
print OUTPUT $png;
sleep (5);
}



unfortunatley it ended up with no stored image the folder images
note; the perlscript called mech20.pl resides in the folder



Code
perl/


the folder called "images" resides in the same folder where the script "mech20.pl" is located



Code
 
perl/images


to sum up what happened - the script runs
but it does not store any thing!?



Code
 
martin@linux-70ce:~/perl> perl mech20.pl

print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 1.

print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 2.

print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 3.
http://www.google.com
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 4.
http://www.yahoo.com
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 5.
http://www.cnn.com
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 6.
http://www.bing.com
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 7.
http://www.nbcnews.co
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 8.
http://www.msnbc.com
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 9.
http://news.bbc.co.uk
print() on closed filehandle OUTPUT at mech20.pl line 24, <INPUT> line 10.
martin@linux-70ce:~/perl>



do you have any idea!?

love to hear from you

many greetings


Laurent_R
Veteran / Moderator

May 25, 2013, 6:39 AM

Post #4 of 15 (903 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

You don't succeed to open your output, no wonder it does not work. The path if probably wrong.

To get a better diagnostic, change the following line:


Code
open(OUTPUT, ">$path/$name");


to


Code
my $file_out = "$path/$name"; 
open OUTPUT, ">", $file_out or die "cannot open $file_out $! \n";


You'll get an explicit error message and you'll see probably something wrong in your path and/or file name.


dilbert
User

May 25, 2013, 5:47 PM

Post #5 of 15 (895 views)
Re: [Laurent_R] storing the output of a www::mechanize script [In reply to] Can't Post

 
hello dear Laurent,

many many thanks for the reply. Great to hear from you

i did the changes - you mentioned. see below the code - and also the results:



Code
#!/usr/bin/perl 

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

my $path = "~/perl/images";

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
my $file_out = "$path/$name";
open OUTPUT, ">", $file_out or die "cannot open $file_out $! \n";
print OUTPUT $png;
sleep (5);
}


see the output in terminal


Code
 
martin@linux-70ce:~/perl> perl mech20.pl
cannot open ~/perl/images/.png file or directory cannot be found
martin@linux-70ce:~/perl>



hmmm what can we do now!?


FishMonger
Veteran / Moderator

May 25, 2013, 6:14 PM

Post #6 of 15 (893 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

Specify the full path.


dilbert
User

May 25, 2013, 11:58 PM

Post #7 of 15 (884 views)
Re: [FishMonger] storing the output of a www::mechanize script [In reply to] Can't Post

 

hello dear fishmonger many tanks for the quick reply. what do we need ,,,,

note; the perlscript called mech20.pl resides in the folder



Code
	 

perl/


the folder called "images" resides in the same folder where the script "mech20.pl" is located




Code
  
perl/imag




which path do you want - should i provide more data? Just let me know.

many greetings

dilbert


FishMonger
Veteran / Moderator

May 26, 2013, 1:41 AM

Post #8 of 15 (882 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

Don't use ~/ in the path. Replace that with the actual path that it represents.


Laurent_R
Veteran / Moderator

May 26, 2013, 2:43 AM

Post #9 of 15 (875 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post


In Reply To

martin@linux-70ce:~/perl> perl mech20.pl
cannot open ~/perl/images/.png file or directory cannot be found
martin@linux-70ce:~/perl>



Your file name is most probably wrong: ".png" instead of something like "my_image.png".


dilbert
User

May 26, 2013, 4:56 AM

Post #10 of 15 (868 views)
Re: [Laurent_R] storing the output of a www::mechanize script [In reply to] Can't Post

hello dear Laurent , hello dear Fishmonger,

i think that i messed up my script a bit.





Code
 
#!/usr/bin/perl

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

my $path = "images";

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
my $file_out = "$path/$name";
open OUTPUT, ">", $file_out or die "cannot open $file_out $! \n";
print OUTPUT $png;
sleep (5);
}



and see the corresponding output: note we have some minor chnages - now
the script does some thing

but - nothing is stored. That is funny - isnt it!?

see the output.




Code
 
martin@linux-70ce:~/perl> perl mech20.pl

http://www.google.com
cannot open images/http://www.google.com.png Datei oder Verzeichnis nicht gefunden
martin@linux-70ce:~/perl>


hmmm . i tink that the script needs some rework

any idea!?

greetings


(This post was edited by dilbert on May 26, 2013, 4:57 AM)


Laurent_R
Veteran / Moderator

May 26, 2013, 8:36 AM

Post #11 of 15 (859 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

Quite clearly, "images/http://www.google.com.png" is not a valid address. No wonder you don't find something.

What do you have in the urls.txt file?

May be you don't have any ***.png file at the address you are looking at.

Hum, I don't use WWW::Mechanize::Firefox, so I can't help you very much on this part.


dilbert
User

May 26, 2013, 9:32 AM

Post #12 of 15 (857 views)
Re: [Laurent_R] storing the output of a www::mechanize script [In reply to] Can't Post

Hello you both


that is the content of urls.txt



Code
 
http://www.google.com
http://www.yahoo.com
http://www.cnn.com
http://www.bing.com
http://www.nbcnews.co
http://www.msnbc.com
http://news.bbc.co.uk


well mechanize firefox does one thing. it creates little screenshots - while
rendering all the stuff gathered thorugh the browser.

so - in my perl-beginner-words: the png-thing is one that is being created through the proecess.





well this one that is shown below runs nice - though it does not store well -
i gue┬┤ss it stores somewhere in the tmp - at root-level...



Code
 
#!/usr/bin/perl
use strict;
use warnings;

use WWW::Mechanize::Firefox;

my @urls = qw(
http://www.google.com
http://www.yahoo.com
http://www.cnn.com
http://www.bing.com
http://www.nbcnews.com
);

my $temp = '/tmp';
my $mech = WWW::Mechanize::Firefox->new('create');

foreach my $url (@urls){
my ($name) = $url =~ /www\.(\w+)\.com/;
print "creating $name.png\n";

$mech->get($url);
sleep(5);
my $png = $mech->content_as_png(undef, undef, {width => 240, height => 240});

my $file = "$temp/$name".".png";
open my $fh, ">", $file or die "couldnt create $file";
binmode $fh;
print $fh $png;
close $fh;
}

print "done\n";



see the output


Code
 
./images:
martin@linux-70ce:~/perl> perl mech3.pl
creating google.png
creating yahoo.png
creating cnn.png
creating bing.png
creating nbcnews.png
done
martin@linux-70ce:~/perl>



Laurent_R
Veteran / Moderator

May 26, 2013, 11:42 AM

Post #13 of 15 (852 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

If this other script runs correctly, why don't you simply clone it?


dilbert
User

May 26, 2013, 12:03 PM

Post #14 of 15 (849 views)
Re: [Laurent_R] storing the output of a www::mechanize script [In reply to] Can't Post

hello dear Laurent

many many thanks for the quick reply - great to hear from you.

yes i will clone it. With two minor changes:

a. i want to make this script making use of a separate file - such as urls.txt
with that i have the options in this separated list containing all the urls.

b. making sure that the storage of the results is not (!!!!) in the root-area of the linux-machine. With that i have more control.
Note - if stored in the tmp. of root i loose all the results after switchin off the machine.


I will come back and show you all the results of the creation of a clone.

meanwhile - many many greetings

dilbert


Laurent_R
Veteran / Moderator

May 26, 2013, 2:23 PM

Post #15 of 15 (844 views)
Re: [dilbert] storing the output of a www::mechanize script [In reply to] Can't Post

That's what I meant: cloning the important part that works correctly and changing what you need to change, i.e. reading the URLs from a file instead of having it hard-coded within the program (your other program shows that you know how to do it) and changing the target directory (quite easy).

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives