CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
adding the domain to href's

 



Aquilo
journeyman

Dec 24, 2000, 2:47 AM

Post #1 of 5 (3986 views)
adding the domain to href's Can't Post

how do I replace <a href=page.html> with the domain in front when I use LWP::Simple so it points to the right domain??

eg: <a href="http://www.domain.com/page.html">



japhy
Enthusiast / Moderator

Dec 24, 2000, 8:10 AM

Post #2 of 5 (3985 views)
Re: adding the domain to href's [In reply to] Can't Post

The general means is to use an HTML parser -- you set up a "handler" for <A> tags, and then inside that handler, check to see if the HREF attribute is a relative URL.

The other method is to insert a <BASE HREF="..."> tag inside the <HEAD> tag.

I have an HTML parser in the works, undergoing some testing, called YAPE::HTML. It would be used, in your case, like so:


Code
use YAPE::HTML; 
use strict;

# assume $content is the variable holding the HTML
my $content;
my $parser = YAPE::HTML->new($content);

open OUT, ">new_file.html" or die "can't create new_file.html: $!";
while (my $chunk = $parser->next) {
if (
$chunk->type('tag') and
$chunk->tag('a') and
$chunk->attr('href')
) {
my $url = $chunk->attr('href');
$url = "http://www.website.com" . $url if $url =~ m!^/!;
$chunk->setattr(href => $url);
}
print OUT $chunk->string;
}
close OUT;

die "problem: ", $parser->error unless $parser->done;


Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author


Aquilo
journeyman

Dec 24, 2000, 2:55 PM

Post #3 of 5 (3982 views)
Re: adding the domain to href's [In reply to] Can't Post

$evdata =~ s!\<a href=(.*?)\>+!<a href\=http:\/\/www.ev90.com\/.*?\>!g;

This is what I'm trying to do. Setting a base url would mess with other stuff on the output page, and that looks like a lot just to replace one thing with another, but thanks!!

How do I carry (.*?) the content from the original tag to the one I'm going to replace it with??




sleuth
Enthusiast

Dec 24, 2000, 4:53 PM

Post #4 of 5 (3981 views)
Re: adding the domain to href's [In reply to] Can't Post

 
it's carried in $1, then the next (.*) will be carried in $2. And so on,

Sleuth



Aquilo
journeyman

Dec 25, 2000, 4:58 AM

Post #5 of 5 (3978 views)
Re: adding the domain to href's [In reply to] Can't Post

THANK YOU!!! :)
works great!!!!!

There are just 42 url's to change and it would have been crazy hard coding them by hand, but then they would be static now they are dynamic "each link gets a number after it, for the number of sites for that link"...

$data =~ s!\<a href=(.*?)\>+!<a href\=http:\/\/www.ev90.com\/$1\>!g;
neat that fixed it all! :)


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives