
dilbert
User
Sep 22, 2010, 2:02 AM
Views: 1919
|
|
data Parsing for Newbie (many Html-files)
|
|
|
hi ve got 25 Tsd files - all are stored in one folder. each site contains Adresses (see below) Each data-set has got a unique ID-Number! First task is to take all the 25 thousand html-files and to strip out - (parse) the therein contained adress-sets. This is a Perl-task! Sure thing! see a dataset: Name: Mister Miller Adresse: Telefon: Fax: ID-Nummer: 2210202 Mail-Adress: Mister_Miller@hotmail.com Website: short url: http://www.TheWEBsite.org/[ID-Number - here 2210202] The second task can be done with Perl: In the last line of Adress-set there is an URL - with a short-way that is build up with two pieces http://www.TheWEBsite.org/[ID-Number - here 2210202] in order to rebuild the original URL i have to set the url together and call it.... short url: http://www.TheWEBsite.org./[ID-Number] how should i do this second task!? look forward to hear from you
(This post was edited by dilbert on Sep 22, 2010, 3:28 PM)
|