CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
stop after first match

 



monocle
User

Jul 20, 2000, 7:17 AM

Post #1 of 5 (2936 views)
stop after first match Can't Post

have a bit of trouble. I am reading an html file line by line in search of any line that has an href and printing out that href marked with line number. I am using this code to look for the href:<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

if ($file_line =~ /<a href="(.*)">/i){
$linecode = "<a href=\"$1\">";
}</pre><HR></BLOCKQUOTE>$1 will contain the actual link destination. My problem is that if the href is linking on an image, $1 also picks up the image tag because it also ends with ">.

so how can I tell this to only grab what is between the <a href=" and the first ">?

or if any one has a better way to do this...plaease let me know.

thanks


------------------
Monocle
Hear great techno music by Monocle at http://www.mp3.com/monocle. CD now on sale!



monocle
User

Jul 19, 2000, 10:02 PM

Post #2 of 5 (2936 views)
Re: stop after first match [In reply to] Can't Post

thanks. that seemed to do the trick. I don't need this to be too robust. just a little script to index our entire site and check for orphaned files and broken links and stuff. kind of a hack at the moment. maybe i can improve it later. don't really have time right now to figure out how to get that HTML::Parser set up. I've never added modules before.

but another question: How can I accomodate multiple <a href in same line?


------------------
Monocle
Hear great techno music by Monocle at http://www.mp3.com/monocle. CD now on sale!



Kanji
User / Moderator

Jul 19, 2000, 10:45 PM

Post #3 of 5 (2936 views)
Re: stop after first match [In reply to] Can't Post

See Randal Schwartz's Web Techniques columns on this very subject.
<UL TYPE=SQUARE>
<LI> http://www.stonehenge.com/merlyn/WebTechniques/col35.html
<LI> http://www.stonehenge.com/merlyn/WebTechniques/col27.html
<LI> http://www.stonehenge.com/merlyn/WebTechniques/col14.html
<LI> http://www.stonehenge.com/merlyn/WebTechniques/col07.html
</UL>You should also check out the ultra-groovy HTML::LinkExtor (included with HTML::Parser, and has a great example of usage).

[This message has been edited by Kanji (edited 07-20-2000).]


TheGame+
Deleted

Jul 20, 2000, 8:29 AM

Post #4 of 5 (2936 views)
Re: stop after first match [In reply to] Can't Post

Basically, you want to grab everything between <a href=" and the first ">.
So you could use (at least) two basic methods here to replace your (.*) :

1) ([^"]*) : matches everything that's not a "
2) (.*?) : makes the regex "non-greedy"

You'll find more details by typing 'perldoc perlre'.

Of course, if you intend to do some serious parsing of more complex HTML documents, you would be better off using an existing module like HTML::LinkExtor. For instance, this code won't match links that are spread over different lines...


monocle
User

Jul 20, 2000, 12:15 PM

Post #5 of 5 (2936 views)
Re: stop after first match [In reply to] Can't Post

my problem with modules is this: I don't know how to install them. I have downloaded some that I would like to use but i can't figure out what to do.

Part of the problem is that up until last week, all i've ever done is write scripts for use on my hosts apache/unix set-up. Now i am trying to develop some stuff on my own NT machine with Sambar Server...and I have to admit...I am very confused. Sambar installs perl5.something but i can't make heads or tails of the documentation on how to add modules. I am very behind on my perl. I know how to do the things i need to do, until now. I have always just sent an email and the sysadmin added the modules. Smile

This is surely not the correct forum to get help on that. Where would I ask such questions?


------------------
Monocle
Hear great techno music by Monocle at http://www.mp3.com/monocle. CD now on sale!


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives