
Jasmine
Administrator
Feb 24, 2002, 10:53 AM
Post #5 of 6
(3057 views)
|
|
Re: [yapp] URL matching in text
[In reply to]
|
Can't Post
|
|
What about [url=http://search.cpan.org/search?dist=HTML-Parser]HTML::LinkExtor? From the [url=http://search.cpan.org/doc/GAAS/HTML-Parser-3.25/lib/HTML/LinkExtor.pm]docs: [perl] #!/usr/bin/perl -w use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; $url = 'http://perlguru.com/gforum.cgi?post=13765'; # for instance $ua = LWP::UserAgent->new; # Set up a callback that collect image links my @imgs = (); sub callback { my($tag, %attr) = @_; return if $tag ne 'a'; # we only look closer at <img ...> push(@imgs, values %attr); } # Make the parser. Unfortunately, we don't know the base yet # (it might be diffent from $url) $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("\n", @imgs), "\n"; [/perl] Running the above on this page gave these results: http://perlarchive.com/ http://perlarchive.com/guide/ http://perlguru.com/ http://tlc.perlarchive.com/ http://perlarchive.com/advertising.shtml http://perlarchive.com/mailing_list.shtml http://perlarchive.com/ http://perlarchive.com/ http://perlguru.com/gforum.cgi?guest=4383 http://perlguru.com/gforum.cgi?do=search;guest=4383 http://perlguru.com/gforum.cgi?do=whos_online;guest=4383 http://perlguru.com/gforum.cgi?do=login;guest=4383 http://perlguru.com/gforum.cgi?guest=4383 http://perlguru.com/gforum.cgi?guest=4383;category=4 http://perlguru.com/gforum.cgi?forum=13;guest=4383 http://www.gossamer-threads.com/scripts/gforum/ http://perlguru.com/gforum.cgi?do=post_view_printable;post=13765;guest=4383 http://perlguru.com/gforum.cgi?username=yapp;guest=4383 http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fwww.cool-programming.f2s.com http://perlguru.com/gforum.cgi?username=Coderifous;guest=4383 http://perlguru.com/gforum.cgi?post=13765#13765 http://perlguru.com/gforum.cgi?username=yapp;guest=4383 http://perlguru.com/gforum.cgi?post=13765#13826 http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fwww.cool-programming.f2s.com http://perlguru.com/gforum.cgi?username=gregarios;guest=4383 http://perlguru.com/gforum.cgi?post=13765#13826 http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fwww.macpicks.com http://perlguru.com/gforum.cgi?username=Jasmine;guest=4383 http://perlguru.com/gforum.cgi?post=13765#13765 http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fsearch.cpan.org%2Fsearch%3Fdist%3DHTML-Parser http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fsearch.cpan.org%2Fdoc%2FGAAS%2FHTML-Parser-3.25%2Flib%2FHTML%2FLinkExtor.pm http://perlguru.com/gforum.cgi?url=http%3A%2F%2F123.456.789.0 http://perlguru.com/gforum.cgi?url=invalid.whee mailto:test@test.com http://perlguru.com/gforum.cgi?do=post_editlog;post=14502;guest=4383 http://perlguru.com/gforum.cgi?do=search;guest=4383 http://www.gossamer-threads.com/ http://creativefundamentals.com/
(This post was edited by Jasmine on Feb 24, 2002, 10:57 AM)
|