CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Extract data from webpage

 



sertomallo
New User

Mar 25, 2013, 2:22 AM

Post #1 of 4 (541 views)
Extract data from webpage Can't Post

Hello,

I need help with a problem with extracting data from a website.

I need to extract the data of daily currency exchange from the following webpage:
http://finanza-mercati.ilsole24ore.com/valute/valute-minori/valute-minori.php

In particular I should extract the numerical data of the currency
"UIC-DZD-Dinaro Algerino" ( today at 102.179 for example).

I tried to use LWP :: UserAgent but with poor results.

Do you know if there is the possibility of extracting such data?

The table on the web page seems to be populated via jquery asynchronous calls.

Here is the code I tried to use:

Code
#!/usr/bin/perl  

$ENV{PERL_LWP_USE_HTTP_10} = 1;

use LWP::UserAgent;
#use LWP::Debug qw(+ -conns);

use HTTP::Request::Common qw(POST);
use HTML::Form;

use URI::Escape;


if ($ENV{PERL_LWP_USE_HTTP_10})
{
print("Settato ENV{PERL_LWP_USE_HTTP_10} a TRUE\n");
}

print("PERL_LWP_USE_HTTP_10 = " . $ENV{PERL_LWP_USE_HTTP_10} . "\n\n\n");


# Non prende alcun parametro
if($#ARGV != -1)
{
$temp = $#ARGV + 1;
print ("Sintassi Errata\n");
print ("Numero Parametri = $temp \n");
print ("Sintassi: \n");
print ("cambio \n");
exit();
}

$impronta = "PKPK000A - SOCIETA PUBBLICA ALGERIA|1";
#$start = @ARGV[0];
#$end = @ARGV[1];


$agent = new LWP::UserAgent;
#$agent->proxy('http', 'http://10.0.0.83:8008/');
#$agent->proxy('http', 'http://10.0.0.210:8008/');


#print("Default Headers \n " . $agent->default_headers . "\n\n\n");

#$agent->default_headers(undef);


$src_url = "http://finanza-mercati.ilsole24ore.com/valute/valute-minori/valute-minori.php";
#$traccia = "DZD_FixingUIC\">";
$traccia = "<span id=\"PRFX_!EUR/DZD_FixingUICPrec\">";
#$traccia = "PRFX_!EUR/DZD_FixingUICPrec\">";

print("Traccia = " . $traccia . " \n");

print("Data normale = " . localtime . "\n");

($temp, $temp, $temp, $giorno, $mese, $anno, $temp, $temp, $temp) = localtime;
#data di prova sarebbe 11/09/2006
$giorno = $giorno - 1;
#$giorno = $giorno + 10; # temporaneo
#$giorno = 11;
$mese = $mese + 1;
#$mese = 10;
#$mese = $mese - 1;
$anno = 1900 + $anno;
#$anno = 2006;
$data = sprintf("%02d", $giorno) . "/" . sprintf("%02d", $mese) . "/" . sprintf("%02d", $anno % 100);

print("Data Attuale = " . $data . "\n");

#Connessione al sito del Sole 24 Ore
$request1 = new HTTP::Request("GET", $src_url);
print($request1 . "\n");
$response1 = $agent->request($request1);
#if ($response1->is_success) {
# print $response1->content;
# } else {
# print $response1->status_line . "\n";
#}
print($response1 . "\n");

$pos = index($response1->content, $traccia);
#$pos = index($response->content, $traccia, $pos+1);
print($pos . "\n");

if($pos != -1)
{
$val = substr($response1->content, $pos + length($traccia), 6);
#$val = $val . "00";
substr($val, index($val, "."), 1, ',');
print("Valore Cambio = " . $val . "\n");
}
else
{
print("Traccia non trovata \n");
}

# $val = "95.243";
# $val = $val . "00";
# substr($val, index($val, "."), 1, ',');

exit();



Kenosis
User

Mar 25, 2013, 3:23 PM

Post #2 of 4 (531 views)
Re: [sertomallo] Extract data from webpage [In reply to] Can't Post

Perhaps the following will be helpful:


Code
use strict; 
use warnings;
use LWP::Simple;
use Mojo::DOM;

my $html = get 'http://finanza-mercati.ilsole24ore.com/fcxp?page=BodyListinoCambiMinori&cmd=framexplane&chId=70&RatPageName=N24:finanza-e-mercati:valute:valute-minori&RatHier1=N24,finanza-e-mercati,valute,valute-minori&RatType=N24:finanza-e-mercati:default&RatEvents=';
my $dom = Mojo::DOM->new($html);
my $b = $dom->at('#PRFX_!EUR/DZD_FixingUICPrec');
print $b->text


Output:


Code
102.179


The table at that page is actually within a frame, that frame's contents coming from the url above. Mojo::DOM is used to parse the html, looking for the text in the span whose id is PRFX_!EUR/DZD_FixingUICPrec, which is associated with "UIC-DZD-Dinaro Algerino."


(This post was edited by Kenosis on Mar 25, 2013, 3:26 PM)


sertomallo
New User

Mar 26, 2013, 1:49 AM

Post #3 of 4 (518 views)
Re: [Kenosis] Extract data from webpage [In reply to] Can't Post

Thank you so much!
It works perfectly.

I didn't know Mojo: DOM, and I checked that solves my problem.


Kenosis
User

Mar 26, 2013, 9:21 AM

Post #4 of 4 (515 views)
Re: [sertomallo] Extract data from webpage [In reply to] Can't Post

You're most welcome, sertomallo! Glad it worked for you.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives