
japhy
Enthusiast
/ Moderator
Dec 24, 2000, 8:10 AM
Post #2 of 5
(8643 views)
|
Re: adding the domain to href's
[In reply to]
|
Can't Post
|
|
The general means is to use an HTML parser -- you set up a "handler" for <A> tags, and then inside that handler, check to see if the HREF attribute is a relative URL. The other method is to insert a <BASE HREF="..."> tag inside the <HEAD> tag. I have an HTML parser in the works, undergoing some testing, called YAPE::HTML. It would be used, in your case, like so:
use YAPE::HTML; use strict; # assume $content is the variable holding the HTML my $content; my $parser = YAPE::HTML->new($content); open OUT, ">new_file.html" or die "can't create new_file.html: $!"; while (my $chunk = $parser->next) { if ( $chunk->type('tag') and $chunk->tag('a') and $chunk->attr('href') ) { my $url = $chunk->attr('href'); $url = "http://www.website.com" . $url if $url =~ m!^/!; $chunk->setattr(href => $url); } print OUT $chunk->string; } close OUT; die "problem: ", $parser->error unless $parser->done; Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author
|