
dilbert
User
Oct 27, 2010, 9:14 AM
Post #1 of 1
(169 views)
|
|
www scraper - saving data in an array
|
Can't Post
|
|
first of all.-This is a true place for learning. I am new to programming - and i am sure that this is a superb place for all novices! I am a beginner - and i learn the most in practical situations - real live situations...So here is one! i like Web::Scraper because it is a web scraper toolkit because it provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged Perl data strcuture. That is great! I want to do some investigations and perl-lessions with this. I am sure i can learn alot about perl. I tried to apply use URI; use Web::Scraper; # First, create your scraper block my $tweets = scraper { # Parse all LIs with the class "status", store them into a resulting # array 'tweets'. We embed another scraper for each tweet. process "li.status", "tweets[]" => scraper { # And, in that array, pull in the elementy with the class # "entry-content", "entry-date" and the link process ".entry-content", body => 'TEXT'; process ".entry-date", when => 'TEXT'; process 'a[rel="bookmark"]', link => '@href'; }; }; my $res = $tweets->scrape( URI->new("URL") ); # The result has the populated tweets array for my $tweet (@{$res->{tweets}}) { print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n"; } Which is available at CPAN. If you see above the original code, I want to apply that, but with some new definitions which obviously have to be changed! use URI; use Web::Scraper; # First, create your scraper block my $tweets = scraper { # Parse all LIs with the class "status", store them into a resulting # array 'tweets'. We embed another scraper for each tweet. process "li.status", "tweets[]" => scraper { # And, in that array, pull in the elementy with the class # "entry-content", "entry-date" and the link process ".entry-content", body => 'TEXT'; process ".entry-date", when => 'TEXT'; process 'a[rel="bookmark"]', link => '@href'; }; }; my $res = $tweets->scrape( URI->new("add an url") ); # The result has the populated tweets array for my $tweet (@{$res->{tweets}}) { print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n"; } As mentioned above, I want **to apply it on a site [on this site here][1]** How can I want to apply this code? How is this doable!? Note: I can change the values and attributes. And I can get the data from the parsed site into the array. That is pretty nice! [1]: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=303.8726269876093&SchulAdresseMapDO=116270
|