CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Simple RegEx Help! Please!

 



martygilbert
New User

Jun 30, 2008, 12:36 PM

Post #1 of 5 (3932 views)
Simple RegEx Help! Please! Can't Post

Hi there!

I'm going through a bunch of web content, and need to make all of the links relative, rather than hard-coded because I'm generating an archive to put on disk. So, instead of the link being to "/images/pic1.jpg", I want it to be "images/pic1.jpg".

I have the code ready to add before the "images" directory to make it relative. It stores the number of "../" in a variable called $path. For some reason, though, my regex doesn't work!

Basically, I want to replace anything like:
= "/anything"
with
= "$path/anything"

Currently, I have the following:


Code
if(/\=\s*\"\/.*/i){ 
s/\=\s*\"\/(.*)/\=\"$path$1/ig;
}


I've never claimed to be a RegEx pro, but I usually hack one up when I need it with great success....This one, I'm afraid, never works like I think it should!

Any help would be greatly appreciated!

--MartyGilbert


(This post was edited by martygilbert on Jun 30, 2008, 12:44 PM)


KevinR
Veteran


Jun 30, 2008, 1:09 PM

Post #2 of 5 (3926 views)
Re: [martygilbert] Simple RegEx Help! Please! [In reply to] Can't Post

seems to work:


Code
$path = '../'; 
$_ = '<img src="/anything/frog.gif">';

if(m|=\s*"/.*|i){
s|=\s*"/(.*)|="$path$1|ig;
print;
}


I used | instead of / for the regexp delimniter to make the regexp more readable, I also removed the unecessary backslashes you had also to make the code more readable.

If the regexp does not work it could be the pattern you are using does not match your data.

Side note, the "if" condition is actually not necessary, you can write it like this:



Code
$path = '../'; 
$_ = '<img src="/anything/frog.gif">';

if(s|=\s*"/(.*)|="$path$1|ig) {
print;
}

-------------------------------------------------


martygilbert
New User

Jun 30, 2008, 1:21 PM

Post #3 of 5 (3924 views)
Re: [KevinR] Simple RegEx Help! Please! [In reply to] Can't Post

Thanks a lot for the help. Using the vert bars instead the back slash makes it much more readable.

I think I found my problem; It is only replacing the first occurrence per line! I thought that's what the |ig was for in the search/replace line.

My line goes from this:

Code
<a href="/opinions0203/opinions.html"><IMG SRC="/OpinionsButton/images/opinions.gif" WIDTH=150 HEIGHT=200 ALT="Opinions"></a

To this:

Code
<a href="opinions0203/opinions.html"><IMG SRC="/OpinionsButton/images/opinions.gif" WIDTH=150 HEIGHT=200 ALT="Opinions"></a>

Notice only the first occurrence is switched from '/opinions0203...' to 'opinions0203...' (this is a first level link, so no "../" is needed).

Am I wrong in my understanding of the |g ?

Thanks again,

Marty


KevinR
Veteran


Jun 30, 2008, 8:47 PM

Post #4 of 5 (3906 views)
Re: [martygilbert] Simple RegEx Help! Please! [In reply to] Can't Post

The problem is the pattern in the regexp. This should work better:


Code
$path = '../';  
$_ = '<a href="/opinions0203/opinions.html"><IMG SRC="/OpinionsButton/images/opinions.gif" WIDTH=150 HEIGHT=200 ALT="Opinions"></a';

s|=\s*"/([^ >]+)|="$path$1|ig;
print;


in your regexp you have (.*) which is greedy and matches until the end of the string. In my regexp the match stops at a space or a closing html bracket. The pattern: [^ >] is a negated character class which basically says to match everything except whats in the negated character class.

Trying to use regexps on html code can be frustrating. If it does not work as well as you hope you may need to use an html aware parser.
-------------------------------------------------


martygilbert
New User

Jun 30, 2008, 9:55 PM

Post #5 of 5 (3904 views)
Re: [KevinR] Simple RegEx Help! Please! [In reply to] Can't Post

Great!

That makes perfect sense. DUH! Can't believe I didn't catch that one.

Because these people used many spaces in their link names, I think I'll use [^"] instead of your [^ >]. That way, it'll grab the link to the ending quote (") instead of stopping at a space along the way. I assume that will work like I think it will.

Thanks again for your help!

--Marty

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives