download data from web

Jul 14, 2016, 1:02 PM

I have a code that lacks any touch on the lines:

$ Html = ~ s / \ n // g;
$ Html = ~ s / table / \ n / g;
my @ids = $ html = ~ m {<b> <a ratings href="/es/film(\d+).html"> * / \ d + \ gif..} g;

I do not get to find the error because I do not record data in the file.

I think the problem is because the web has changed its estrutura filmaffinity and now is:


</ Div>
<Div class = "all-films-movie fa-shadow"> <div class = "movie-movie-card card-1" data-movie-id = "980190">
<Div class = "mc-poster">
<a title="X (Serie of TV)" href="/es/film980190.html"> <img width = "100" height = "" src = "http://pics.filmaffinity.com/x_tv_series-902187472 -msmall.jpg "alt =" X (TV Series) "> </a>
</ Div>
<Div class = "mc-info-container">
<Div class = "mc-actions">
</ Div>
<Div class = "mc-title"> <a href="/es/film980190.html" title="X (Serie of TV)"> X (TV Series) </a> (2001) <img src = "/imgs/countries/JP.jpg" alt = "Japan" title = "Japan"> </ div>
<Div class = "mr-rating">
<Div class = "avgrat-box"> 6.6 </ div>
<Div class = "ratcount-box"> 269 <i class = "fa fa fa-simple-user-o-fa"> </ i> </ div>
</ Div>
<Div class = "mc-manager">
<Div class = "credits"> <a href="/es/search.php?stype=director&sn&stext=Yoshiaki%20Kawajiri" title="Yoshiaki Kawajiri"> Yoshiaki Kawajiri </a> </ div> </ div>
<Div class = "mc-cast">
<Div class = "credits"> <a href="/es/search.php?stype=cast&sn&stext=Animation" title="Animation"> Animation </a> </ div> </ div>
</ Div>

<Div class = "clearfix"> </ div>

<Div class = "lists-box"> </ div>
</ Div>

-------------------------------------------------- -------------------------------

Herewith the .pl file if puedierais hechar out.

A greeting.

Jul 15, 2016, 4:10 AM

Quite a few problems in your code. Just identifying a few here below.

You have strange spacing in many places, and it will not work in the regular expressions of your substitutions.

Then your pattern iof looking for "a ratings", which appears nowhere in your HTML string, and is therefore bound to fail.

In a regex, the star ("*") is a quantifier applied to the previous letter and does not do what you presumably think.

You seem to look for a "gif" extension, but there is nothing such in your HTML source string.

Finally, although it can sometimes be done for very very simple cases, it is usually considered that using regexes to parse HTML is a bad idea. You should probably use a CPAN module to parse your HTML. See some possibilities here: https://www.google.fr/search?q=cpan+html&ie=utf-8&oe=utf-8&client=firefox-b&gfe_rd=cr&ei=KsSIV8OEJKKx8wfO4p3IBQ.

Jul 15, 2016, 7:15 AM

above all, thanks for answering.

The version of that file worked properly in 2012, what I think happens is that it has changed the configuration of the web filmaffinity and so now does not work.

Hechare out although not very well give me this.

Thank you.

Jul 15, 2016, 9:09 AM

That is exactly why it's a bad idea to use regex's to parse html. They are too fragile.

You should redesign the script to use one of the html parser modules on cpan.

