Jul 20, 2000, 8:29 AM
Post #4 of 5
Basically, you want to grab everything between <a href=" and the first ">.
So you could use (at least) two basic methods here to replace your (.*) :
1) ([^"]*) : matches everything that's not a "
2) (.*?) : makes the regex "non-greedy"
You'll find more details by typing 'perldoc perlre'.
Of course, if you intend to do some serious parsing of more complex HTML documents, you would be better off using an existing module like HTML::LinkExtor. For instance, this code won't match links that are spread over different lines...