CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner: Re: [dilbert] data Parsing for Newbie (23 Tsd Html-files): Edit Log


Sep 22, 2010, 2:26 AM

Views: 4769
Re: [dilbert] data Parsing for Newbie (23 Tsd Html-files)

First task is to take all the 25 thousand html-files and to strip out - (parse) the therein contained adress-sets.
This is a Perl-task! Sure thing!

That is an html parsing task, sure thing! But since you haven't shown any of the html, it is impossible to know how to extract the data. will need to use one of perl's html processing modules, like HTML::TreeBuilder to extract the data you want from the html.

how should i do this second task!?

In the data you extract from the html page, look for a string that matches a regex that begins with 'ID-Number:, and then capture everything after the colon. For example:

use strict; 
use warnings;
use 5.010;

my $str = 'ID-Number: 2210202';

if ($str =~ /ID-Number: (.+)/) {
my $id = $1;
say $id;


Do the same for the url. Combine the strings.

(This post was edited by 7stud on Sep 22, 2010, 2:27 AM)

Edit Log:
Post edited by 7stud (Enthusiast) on Sep 22, 2010, 2:27 AM
Post edited by 7stud (Enthusiast) on Sep 22, 2010, 2:27 AM

Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives