Home: Perl Programming Help: Beginner:
download data from web

New User

Jul 14, 2016, 1:02 PM

Views: 2908
download data from web


I have a code that lacks any touch on the lines:

$ Html = ~ s / \ n // g;
$ Html = ~ s / table / \ n / g;
my @ids = $ html = ~ m {<b> <a ratings href="/es/film(\d+).html"> * / \ d + \ gif..} g;

I do not get to find the error because I do not record data in the file.

I think the problem is because the web has changed its estrutura filmaffinity and now is:


</ Div>
<Div class = "all-films-movie fa-shadow"> <div class = "movie-movie-card card-1" data-movie-id = "980190">
<Div class = "mc-poster">
<a title="X (Serie of TV)" href="/es/film980190.html"> <img width = "100" height = "" src = "http://pics.filmaffinity.com/x_tv_series-902187472 -msmall.jpg "alt =" X (TV Series) "> </a>
</ Div>
<Div class = "mc-info-container">
<Div class = "mc-actions">
</ Div>
<Div class = "mc-title"> <a href="/es/film980190.html" title="X (Serie of TV)"> X (TV Series) </a> (2001) <img src = "/imgs/countries/JP.jpg" alt = "Japan" title = "Japan"> </ div>
<Div class = "mr-rating">
<Div class = "avgrat-box"> 6.6 </ div>
<Div class = "ratcount-box"> 269 <i class = "fa fa fa-simple-user-o-fa"> </ i> </ div>
</ Div>
<Div class = "mc-manager">
<Div class = "credits"> <a href="/es/search.php?stype=director&sn&stext=Yoshiaki%20Kawajiri" title="Yoshiaki Kawajiri"> Yoshiaki Kawajiri </a> </ div> </ div>
<Div class = "mc-cast">
<Div class = "credits"> <a href="/es/search.php?stype=cast&sn&stext=Animation" title="Animation"> Animation </a> </ div> </ div>
</ Div>

<Div class = "clearfix"> </ div>

<Div class = "lists-box"> </ div>
</ Div>

-------------------------------------------------- -------------------------------

Herewith the .pl file if puedierais hechar out.

A greeting.

Veteran / Moderator

Jul 15, 2016, 4:10 AM

Views: 2894
Re: [cjosemaria] download data from web

Quite a few problems in your code. Just identifying a few here below.

You have strange spacing in many places, and it will not work in the regular expressions of your substitutions.

Then your pattern iof looking for "a ratings", which appears nowhere in your HTML string, and is therefore bound to fail.

In a regex, the star ("*") is a quantifier applied to the previous letter and does not do what you presumably think.

You seem to look for a "gif" extension, but there is nothing such in your HTML source string.

Finally, although it can sometimes be done for very very simple cases, it is usually considered that using regexes to parse HTML is a bad idea. You should probably use a CPAN module to parse your HTML. See some possibilities here: https://www.google.fr/search?q=cpan+html&ie=utf-8&oe=utf-8&client=firefox-b&gfe_rd=cr&ei=KsSIV8OEJKKx8wfO4p3IBQ.

New User

Jul 15, 2016, 7:15 AM

Views: 2885
Re: [Laurent_R] download data from web

above all, thanks for answering.

The version of that file worked properly in 2012, what I think happens is that it has changed the configuration of the web filmaffinity and so now does not work.

Hechare out although not very well give me this.

Thank you.

Veteran / Moderator

Jul 15, 2016, 9:09 AM

Views: 2880
Re: [cjosemaria] download data from web

That is exactly why it's a bad idea to use regex's to parse html. They are too fragile.

You should redesign the script to use one of the html parser modules on cpan.

(This post was edited by FishMonger on Jul 15, 2016, 9:09 AM)