CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Need a Custom or Prewritten Perl Program?: I need a program that...:
Extract <img> from html



Jun 21, 2001, 7:14 AM

Post #1 of 3 (2575 views)
Extract <img> from html Can't Post

Anyone know of a script (or use one) that will parse through an html file and get all of the links that are to images? I need to be able to run the script against web files and not just local ones, as well.


Jun 21, 2001, 2:01 PM

Post #2 of 3 (2570 views)
Re: Extract <img> from html [In reply to] Can't Post

I would point you to the "HTML::LinkExtor" module, part of the HTML-Parser distribution, because that is what I use. Does the job just fine for me and writing the supporting code to implement this module was pretty straight forward. I have not tried it against web files but the module's SYNOPSIS clearly uses an "http://" example.



Jun 21, 2001, 2:23 PM

Post #3 of 3 (2570 views)
Re: Extract <img> from html [In reply to] Can't Post

From the docs for HTML::LinkExtor (part of the HTML::Parser distribution)

  use LWP::UserAgent; 
use HTML::LinkExtor;
use URI::URL;

$url = ""; # for instance
$ua = LWP::UserAgent->new;

# Set up a callback that collect image links
my @imgs = ();
sub callback {
my($tag, %attr) = @_;
return if $tag ne 'img'; # we only look closer at <img ...>
push(@imgs, values %attr);

# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});

# Expand all image URLs to absolute ones
my $base = $res->base;
@imgs = map { $_ = url($_, $base)->abs; } @imgs;

# Print them out
print join("\n", @imgs), "\n";


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives