CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Perl String Match Problem

 



langqinren
New User

Jun 27, 2009, 1:16 PM

Post #1 of 2 (2202 views)
Perl String Match Problem Can't Post

I am writing a parser to extract information from web pages. Example, http://forums.sun.com/thread.jspa?messageID=10247372#10247372 (or see attachment)

I am trying to extract the post content of this page. So I read the page source of this page to $page_src and I try to use string match to extract the corresponding portion of the source.

Here is my code to do the match. I identified the
start tag:
<tr class="white">
and end tag
<td><div class="pad5x10">&nbsp;<\/div><\/td>
<\/tr>
of the post content. (.|\n)*? will match any characters as well as new lines. My code works for other pages but failed when parsing the above linked page (or attached).

Can one point out the problem in my code? Really appreciate!


Code
while ($page_src =~ /<tr class=\"white\">((.|\n)*?)<td><div class=\"pad5x10\">&nbsp;<\/div><\/td>\s+<\/tr>/g) { 
my $match_str = $1;
print $match_str . "\n";
}



(This post was edited by langqinren on Jun 27, 2009, 1:17 PM)


ichi
User

Jun 28, 2009, 2:26 AM

Post #2 of 2 (2193 views)
Re: [langqinren] Perl String Match Problem [In reply to] Can't Post

no need to use regex.

pseudocode

Code
while (<>){ 
if found <td><div class="pad5x10"> : set f=0
if found <tr class="white"> : set flag=1
if flag==1: print line
}


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives