CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
about WWW::Mechanize::Firefox

 



limner
Novice

Feb 28, 2014, 6:30 AM

Post #1 of 8 (5293 views)
about WWW::Mechanize::Firefox Can't Post

i wrote this small program:

use WWW::Mechanize::Firefox;
use Text::Unaccent;

$mech = WWW::Mechanize::Firefox->new( launch => 'C:\Program Files (x86)\Mozilla Firefox\firefox.exe'
,activate => 1, # Tab to foreground );


$id_sito=$ARGV[0]; ### pagina hotel es sleeping-beauty.it.html
$ch_in=$ARGV[1]; ### data check-in es 2014-03-02
$ch_out=$ARGV[2]; ### data check-out es 2014-03-03

$dirro="C:\\app\\Limner\\procedure\\esamina_bookings\\htmlfiles\\"; ### Directory
$tfile="ora_pagina_booking.temp"; ### nome file di destinazione da leggere

$dest=join("",$dirro,$tfile);

$url_base="http://www.booking.com/hotel/it/"; # Base url di booking.com

$url="$url_base$id_sito?checkin=$ch_in&checkout=$ch_out";


$mech->save_url( $url ,$dest );


At the end, this program create a file with the source html of the page.

But if i use firefox and manually save the html source of the same page (with the same variables) i get a little different page.

For example in the page i manually save i can find those 2 lines:
var PageLoadTimer = {};
PageLoadTimer.start = (new Date()).getTime();

while in the file created by the program i can't find that two lines.

Those line are not very important for me, but i want to understand how the WWW::Mechanize::Firefox behave and the differences that exists between the package and getting manually the html source

Thanks in advance to anyone could give a bit of light to this mistery.

Limner


Zhris
Enthusiast

Mar 1, 2014, 1:48 PM

Post #2 of 8 (4938 views)
Re: [limner] about WWW::Mechanize::Firefox [In reply to] Can't Post

I am also interested in information with regards to the cause of this issue. I have read through the docs, FAQ and troubleshooter but could not find anything that is helpful. Your desire to understand why its happening is unsurprising since there is a good chance in later projects that an important section of code goes missing without your knowledge. I don't have WWW::Mechanize::Firefox installed to try out your code, but I can think of a few possible but unlikely reasons as to the cause:

- WWW::Mechanize::Firefox may have recognized that the section of code isn't useful therefore removed.
- WWW::Mechanize::Firefox could be buggy or does not support the javascript in question.
- The website you are scraping may generate user dependent content. There will be subtle differences between you and your script i.e. user agent etc.

It might be worth cross-posting this question on http://www.perlmonks.org/. If all else fails you could email the modules author.

Apologies that I can't be of much help.

Regards,

Chris


limner
Novice

Mar 2, 2014, 4:12 AM

Post #3 of 8 (4927 views)
Re: [Zhris] about WWW::Mechanize::Firefox [In reply to] Can't Post

where could i find the package author email?


Laurent_R
Veteran / Moderator

Mar 2, 2014, 8:16 AM

Post #4 of 8 (4922 views)
Re: [limner] about WWW::Mechanize::Firefox [In reply to] Can't Post

In the POD for that package: corion@cpan.org.


limner
Novice

Mar 2, 2014, 8:19 AM

Post #5 of 8 (4920 views)
Re: [limner] about WWW::Mechanize::Firefox [In reply to] Can't Post

Ok

i wrote him an email, attachin the two html result (manually and scripted) and the perl script

As soon i will receive an answer, i will post it here

Thanks


limner
Novice

Mar 2, 2014, 9:00 AM

Post #6 of 8 (4914 views)
Re: [limner] about WWW::Mechanize::Firefox [In reply to] Can't Post

Hi

i would like to share with the community forum the email i send to the person who wrote the script.

I also attach to this post a .rar file that contain the three files i attached to the email.

This is the text of the email i send today:


Hi

i've a problem with the www:mechanize::firefox perl package.

I wrote a small simple perl script in order to get the html source code from an url then i got the same html source manually from the same url.

I compare the two files and i saw some difference

As example look in the line 183 in the two files (mechanize_page.html and manualli_saved_page.html) and you will see that in the file manually
saved there are two lines that do not compare in the mechanized file:

Those two lines
line 183: var PageLoadTimer = {};
line 184: PageLoadTimer.start = (new Date()).getTime();

Compare only in the manually_saved_page.html and not in the mechanize_page.html

I attach to this email the following files:

1) test.pl file: perl script i use to get the
example file "mechanize_page.html"

2) mechanize_page.html : the html file produced using "test.pl" perl script


3) manualli_saved_page.html : html url saved manually using firefox


I use, to make the test, the same firefox to create the two files on the same computer.

In each of the two cases i close the browser, open the browser, remove all the chronology/cookies and launch the script (or manually went on the page and manually saved the html)

May you tell me why there are some differences between the html source got using the perl script and the one got manually?


Thanks in advance
Danilo Rizzo

System
Win7 ultimate x64
Straberry perl - last version
Firefox 27.0.0.1


I will also share with the community all the answers i will receive.
Attachments: files.rar (116 KB)


limner
Novice

Mar 12, 2014, 3:26 AM

Post #7 of 8 (4630 views)
Re: [limner] about WWW::Mechanize::Firefox [In reply to] Can't Post

Hi to all forum

i received an answer from the script's author, he tried my same script (i send it to him) but he didn't get any differences.

It is possibile that the behavior is due to javascript: i'm not fully convinced but i have no more evidence an this moment.

If someone find some new diffecence between the result from the mechanize script and the html source manually get i write here the author email, maybe could be useful for other people from the forum:

Max Maischein: corion@cpan.org


Zhris
Enthusiast

Apr 2, 2014, 2:57 PM

Post #8 of 8 (4546 views)
Re: [limner] about WWW::Mechanize::Firefox [In reply to] Can't Post

Hi,

Thanks for updating the thread with the information you received back.

When I can I will try running your code to see if I am also able to replicate the issue and will update if I find anything useful.

Chris

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives