CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Regular Expressions:
Perl String Match Problem


New User

Jun 27, 2009, 1:16 PM

Post #1 of 2 (3383 views)
Perl String Match Problem Can't Post

I am writing a parser to extract information from web pages. Example, (or see attachment)

I am trying to extract the post content of this page. So I read the page source of this page to $page_src and I try to use string match to extract the corresponding portion of the source.

Here is my code to do the match. I identified the
start tag:
<tr class="white">
and end tag
<td><div class="pad5x10">&nbsp;<\/div><\/td>
of the post content. (.|\n)*? will match any characters as well as new lines. My code works for other pages but failed when parsing the above linked page (or attached).

Can one point out the problem in my code? Really appreciate!

while ($page_src =~ /<tr class=\"white\">((.|\n)*?)<td><div class=\"pad5x10\">&nbsp;<\/div><\/td>\s+<\/tr>/g) { 
my $match_str = $1;
print $match_str . "\n";

(This post was edited by langqinren on Jun 27, 2009, 1:17 PM)


Jun 28, 2009, 2:26 AM

Post #2 of 2 (3374 views)
Re: [langqinren] Perl String Match Problem [In reply to] Can't Post

no need to use regex.


while (<>){ 
if found <td><div class="pad5x10"> : set f=0
if found <tr class="white"> : set flag=1
if flag==1: print line


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives