CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
download data from web

 



cjosemaria
New User

Jul 14, 2016, 1:02 PM

Post #1 of 4 (1859 views)
download data from web Can't Post

Hello;

I have a code that lacks any touch on the lines:

$ Html = ~ s / \ n // g;
$ Html = ~ s / table / \ n / g;
my @ids = $ html = ~ m {<b> <a ratings href="/es/film(\d+).html"> * / \ d + \ gif..} g;

I do not get to find the error because I do not record data in the file.

I think the problem is because the web has changed its estrutura filmaffinity and now is:

----------------------------------

</ Div>
<Div class = "all-films-movie fa-shadow"> <div class = "movie-movie-card card-1" data-movie-id = "980190">
<Div class = "mc-poster">
<a title="X (Serie of TV)" href="/es/film980190.html"> <img width = "100" height = "" src = "http://pics.filmaffinity.com/x_tv_series-902187472 -msmall.jpg "alt =" X (TV Series) "> </a>
</ Div>
<Div class = "mc-info-container">
<Div class = "mc-actions">
</ Div>
<Div class = "mc-title"> <a href="/es/film980190.html" title="X (Serie of TV)"> X (TV Series) </a> (2001) <img src = "/imgs/countries/JP.jpg" alt = "Japan" title = "Japan"> </ div>
<Div class = "mr-rating">
<Div class = "avgrat-box"> 6.6 </ div>
<Div class = "ratcount-box"> 269 <i class = "fa fa fa-simple-user-o-fa"> </ i> </ div>
</ Div>
<Div class = "mc-manager">
<Div class = "credits"> <a href="/es/search.php?stype=director&sn&stext=Yoshiaki%20Kawajiri" title="Yoshiaki Kawajiri"> Yoshiaki Kawajiri </a> </ div> </ div>
<Div class = "mc-cast">
<Div class = "credits"> <a href="/es/search.php?stype=cast&sn&stext=Animation" title="Animation"> Animation </a> </ div> </ div>
</ Div>

<Div class = "clearfix"> </ div>

<Div class = "lists-box"> </ div>
</ Div>

-------------------------------------------------- -------------------------------

Herewith the .pl file if puedierais hechar out.

A greeting.


Laurent_R
Veteran / Moderator

Jul 15, 2016, 4:10 AM

Post #2 of 4 (1845 views)
Re: [cjosemaria] download data from web [In reply to] Can't Post

Quite a few problems in your code. Just identifying a few here below.

You have strange spacing in many places, and it will not work in the regular expressions of your substitutions.

Then your pattern iof looking for "a ratings", which appears nowhere in your HTML string, and is therefore bound to fail.

In a regex, the star ("*") is a quantifier applied to the previous letter and does not do what you presumably think.

You seem to look for a "gif" extension, but there is nothing such in your HTML source string.

Finally, although it can sometimes be done for very very simple cases, it is usually considered that using regexes to parse HTML is a bad idea. You should probably use a CPAN module to parse your HTML. See some possibilities here: https://www.google.fr/search?q=cpan+html&ie=utf-8&oe=utf-8&client=firefox-b&gfe_rd=cr&ei=KsSIV8OEJKKx8wfO4p3IBQ.


cjosemaria
New User

Jul 15, 2016, 7:15 AM

Post #3 of 4 (1836 views)
Re: [Laurent_R] download data from web [In reply to] Can't Post

above all, thanks for answering.

The version of that file worked properly in 2012, what I think happens is that it has changed the configuration of the web filmaffinity and so now does not work.

Hechare out although not very well give me this.

Thank you.


FishMonger
Veteran / Moderator

Jul 15, 2016, 9:09 AM

Post #4 of 4 (1831 views)
Re: [cjosemaria] download data from web [In reply to] Can't Post

That is exactly why it's a bad idea to use regex's to parse html. They are too fragile.

You should redesign the script to use one of the html parser modules on cpan.


(This post was edited by FishMonger on Jul 15, 2016, 9:09 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives