
dilbert
User
Oct 16, 2010, 1:13 PM
Post #1 of 1
(1354 views)
|
|
HTML::TableExtract - some attributes to go
|
Can't Post
|
|
Hello dear Perl-Experts, i have a parser-job - and i think i have a solution with HTML::TableExtract I also read the documentation for HTML::TableExtract which might help here. The HTML::TableExtract does a good job: Extracts specific tables from HTML source code. And it does that really well. BTW i want/need to do this with a table/site: See this page: [url=http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=672.8924536341191]SCHULE SUCHEN EINGANG Note: click all checkbuttons at the bottom of the site: Then you see a result-page with more than 6400 school-results: see at the right of the site Weitere Informationen anzeigen you can get detailed information if you click Weitere Informationen anzeigen 9 (or ten lines) Schuldaten. Schulnummer: Amtliche Bezeichnung: Strasse: Plz und Ort: Telefon: Fax: E-Mail-Adresse: Schuldaten ändern] :(this is UTF8 encoded or what) Schülergesamtzahl (this is UTF8 encoded or what) Question: can the HTML::TableExtract be applied here? At the resultpage of more than 6400 shools: (See above) Love to hear from you See what i have untill now: I make Use some HTML::TableExtract [PHP] #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use YAML; my $te = HTML::TableExtract->new( attribs => { , => '', , , }); $te->parse_file('myFile.html'); my ($table) = $te->tables; for my $row ( $table->rows ) { cleanup(@$row); print "@$row\n"; } sub cleanup { for ( @_ ) { s/\s+//; s/[\xa0 ]+\z//; s/\s+/ /g; } } [/PHP] i need tho have some help with the attributes! Any and all help will greatly be appreciated. regards!
|