Home: Perl Programming Help: Regular Expressions:
Need some help with regex.



Ennio
Novice

Oct 28, 2008, 6:13 PM


Views: 8565
Need some help with regex.

I have an array with the following content, a list of URLs.

http://www.mysite.com/
http://www.mysite.com/index.html
http://www.mysite.com/contact.html
http://www.myblog.com/
http://www.myblog.com/default.aspx


I'm creating a script that will look in this array, and remove all the self-referencing link, and non-local links for the base URL that I'm looking for.

So if I'm looking for http://www.mysite.com/ (base URL), I will need to remove from the array http://www.mysite.com/, http://www.mysite.com/index.html, http://www.myblog.com/, and http://www.myblog.com/default.aspx

My problem is in the regular expression to do that, I got it to remove all the non-local links, but now I'm not sure on how to remove the self-referencing links. Can I get some help to complete the regular expression.

Thank you

Here is what I have.


Code
  

$base = http://www.mysite.com/;
for ($counter = 0; $counter <= $#links; $counter++){
if ($links[$counter] =~ m/($base)/){
#do something
} else {
#do something
}
}


1arryb
User

Feb 26, 2009, 1:25 PM


Views: 7843
Re: [Ennio] Need some help with regex.

Hi Ennio,

Maybe something like this?

Code
$base = http://www.mysite.com/;  
for ($counter = 0; $counter <= $#links; $counter++){
# Shift parentheses to remember the relative url (if any), not the base.
if ($links[$counter] =~ m|^$base(.*)|){
# Local link.
# $1 is the "remembered" string matched above.
my $rUrl = $1 if $1;
if ( $rUrl and $rUrl =~ m/^index|^default/ ) {
# Self referential link. Throw away.
} else {
# keep
}

} else {
# Non-local link. Throw away.
}
}


Cheers,

Larry