CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Matching problems

 



Gregorio
User

Jul 10, 2001, 3:42 PM

Post #1 of 3 (467 views)
Matching problems Can't Post

Hello all,
I am trying to match items in the following line of text, (it appears many times in a file).

Code
<a href="javascript:showCarDetails('MVAR','MCO','N')"><font face="Arial,Helvetica,Sans-Serif" color="#0066CC" size="2">Economy, 2-4 DR, Automatic, Air Conditioning</font></a>    </td>    <td valign="top"><font face="Arial,Helvetica,Sans-Serif" color="#000000" size="2">1 week @ 120.76/wk w/Unlimited Miles</font></td>

I am trying to extract the "Economy, 2-4 DR, Automatic, Air Conditioning" and the "1 week @ 120.76/wk w/Unlimited Miles". So this is what I tried:

Code
if ($content =~ m#<a href="javascript:showCarDetails('LCAR','MCO','N')"><font face="Arial,Helvetica,Sans-Serif" color=".0066CC" size="2">([^<]+)</font></a>([^<]+)</td>([^<]+)<td valign="top"><font face="Arial,Helvetica,Sans-Serif" color=".000000" size="2">([^<]+)</font></td>#) { 
($a, $b, $c, $d) = ($1, $2, $3, $4);
}

However, nothing is returned, i think the problem is all the spaces between "</a> </td>" and "</td> <td valign="top">" how would I match a line like that even with about 10 spaces between those tags?



abstracts
Novice

Jul 10, 2001, 4:46 PM

Post #2 of 3 (464 views)
Re: Matching problems [In reply to] Can't Post

Hello

The problem with the regexp you have is that there are way too many meta characters in it (brackets, dots, ...) that affect the way the regexp functions.

Also, notice your regexp looks like: something([^<]*)something([^<]*)something([^<]*)something

So, first let's get the somethings:

Code
my @sep = ( 
qq{<a href="javascript:showCarDetails('MVAR','MCO','N')"><font face="Arial,Helvetica,Sans-Serif" color="#0066CC" size="2">},
'</font></a>',
'</td>',
'<td valign="top"><font face="Arial,Helvetica,Sans-Serif" color="#000000" size="2">',
'</font></td>');

for(@sep){
$_ = quotemeta; # to get rid of the meta characters by escaping anything \W
}

my $regexp = join('([^<]*)', @sep); # make the regexp;

($a, $b, $c, $d) = $str =~ /$regexp/;

OR, you can just lump the whole thing in one line (not recommended for many)


Code
my $regexp = join '([^<]*)', map{ quotemeta }( 
qq{<a href="javascript:showCarDetails('MVAR','MCO','N')"><font face="Arial,Helvetica,Sans-Serif" color="#0066CC" size="2">},
'</font></a>',
'</td>',
'<td valign="top"><font face="Arial,Helvetica,Sans-Serif" color="#000000" size="2">',
'</font></td>');
my @ar = $str =~ /$regexp/;

print "@ar\n";

Hope this helps,,,

Aziz,,,



Mortimer
journeyman

Jul 11, 2001, 3:47 PM

Post #3 of 3 (447 views)
Re: Matching problems [In reply to] Can't Post

Have a look at the docs (under HTML) for Parser and TokeParser, and then have a play around. Here's something I've just pulled out of a script I wrote for a client a while ago. It's *very* basic, but works in this case...


Code
use HTML::TokeParser; 

my $html_file = '/path/to/file.html';
my $h = HTML::TokeParser->new( $html_file );

my @fields;
while( my $tok = $h->get_token ){
push @fields, $tok->[1] if $tok->[0] eq 'T' && $tok->[1] =~ /\w/;
}

print "$_\n" for @fields;

Cheers,
Dave.
www.dmscripts.com
davemortimer@bigpond.com



 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives