
dante
New User
Jul 4, 2009, 4:10 PM
Post #1 of 4
(1508 views)
|
|
having trouble matching from html
|
Can't Post
|
|
I'm new to perl, but I was able to use regexp to match and find some of the information I wanted but for others it just doesn't seem to work, and for the life of me I cannot find a problem with it.
<h2 class="title"><a target="_self" class="usg-AFQjCNGt6xjO2z3eqMAvpRbEgFn6NFqeKA sig2-of_mxHbBLzr0HLDwOeNcuA" href="http://www.google.com">title of the article</a></h2> The article titles are marked by the h2 tags and I also want the url for the article, which I replaced with google for this example. The code I'm trying to use right now is:
$content =~ m/<h2 class="title"><a target="_self" class=".*" href="http:\/\/www\.(.*)">(.*)<\/a>/; So that I can find both the title and the url of the article. I have a feeling it is probably just a stupid mistake that I cant find because I have an equally complex match that works just fine
|