CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner: Re: [dilbert] data Parsing for Newbie (23 Tsd Html-files): Edit Log



7stud
Enthusiast

Sep 22, 2010, 2:26 AM


Views: 2334
Re: [dilbert] data Parsing for Newbie (23 Tsd Html-files)


Quote
First task is to take all the 25 thousand html-files and to strip out - (parse) the therein contained adress-sets.
This is a Perl-task! Sure thing!


That is an html parsing task, sure thing! But since you haven't shown any of the html, it is impossible to know how to extract the data. But...you will need to use one of perl's html processing modules, like HTML::TreeBuilder to extract the data you want from the html.


Quote
how should i do this second task!?


In the data you extract from the html page, look for a string that matches a regex that begins with 'ID-Number:, and then capture everything after the colon. For example:


Code
use strict; 
use warnings;
use 5.010;

my $str = 'ID-Number: 2210202';

if ($str =~ /ID-Number: (.+)/) {
my $id = $1;
say $id;
}

--output:--
2210202


Do the same for the url. Combine the strings.


(This post was edited by 7stud on Sep 22, 2010, 2:27 AM)


Edit Log:
Post edited by 7stud (Enthusiast) on Sep 22, 2010, 2:27 AM
Post edited by 7stud (Enthusiast) on Sep 22, 2010, 2:27 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives