CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Parsing values and names in HTML Dropdown

 



twistedphrame
Novice

Mar 2, 2010, 11:02 AM

Post #1 of 7 (3886 views)
Parsing values and names in HTML Dropdown Can't Post

Hello everyone,
I'm having issues getting values and names out of a dropdown in HTML
the HTML is as follows:

Code
 <tr><td></td></tr> 
<tr>
<td>
<table>
<tr>
<td class="label">Term</td>
<td class="right"><select name="ddlTerm" id="ddlTerm">
<option value=""></option>
<option value="10/SP">10/SP - Spring 2010</option>
<option value="10/SU">10/SU - All Summer 2010</option>
<option value="10/SA">10/SA - Summer A 2010</option>
<option value="10/SB">10/SB - Summer B 2010</option>
<option value="10/SC">10/SC - Summer C 2010</option>
<option value="10/SD">10/SD - Summer D 2010</option>
<option value="10/SE">10/SE - Summer E 2010</option>
<option value="10/FA">10/FA - Fall 2010</option>
<option value="11/SP">11/SP - Spring 2011</option>
</select></td>
</tr>
</table>
<hr />
</td>
</tr>


my regular expression is:
$content holds the html for the whole page

Code
$content =~ /id="ddlTerm">\s*<option value=""><\/option>\s*<option value="(.*)">(.*)<\/option>/

this will give me $1 = 10/SP $2 = 10/SP - Spring 2010
but I also need the rest of the choices and would rather not hard code each line into the regexp (I need data from another dropdown with some 40-50 values in it) is there a good way to do this?


(This post was edited by twistedphrame on Mar 2, 2010, 11:03 AM)


twistedphrame
Novice

Mar 2, 2010, 11:05 AM

Post #2 of 7 (3884 views)
Re: [twistedphrame] Parsing values and names in HTML Dropdown [In reply to] Can't Post

Would it be better to get the HTML, save it to a file and then read it line by line instead of all the html in a single variable?


shawnhcorey
Enthusiast


Mar 2, 2010, 1:58 PM

Post #3 of 7 (3880 views)
Re: [twistedphrame] Parsing values and names in HTML Dropdown [In reply to] Can't Post

You would be better using an HTML parse, such as HTML::TreeBuilder, than trying to do it yourself.

__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


twistedphrame
Novice

Mar 2, 2010, 4:37 PM

Post #4 of 7 (3875 views)
Re: [shawnhcorey] Parsing values and names in HTML Dropdown [In reply to] Can't Post

Thanks for the suggestion. I've figured out how to get the string for each element in the Term dropdown but I'm not sure how to get the value of each, when I try the ->content_list I just get memory locations I'm not sure how to use.


Code
my $tree = HTML::TreeBuilder->new; 
$tree->parse($content);
$head = $tree->look_down('_tag', 'td',
sub { $_[0]->as_text =~ m{Term}
}) ;
$term1 = $head->as_text;
$term1 =~ s/(Term)(.*)/$2/;


this will give me a string that looks like:

Code
10/SP - Spring 201010/SU - All Summer 201010/SA - Summer A 201010/SB - Summer B 201010/SC - Summer C 201010/SD - Summer D 201010/SE - Summer E 201010/FA - Fall 201011/SP - Spring 2011

I can write something to extract the names of each easy enough but I want to hash these names to the actual values and I'm not sure as how to do this.


(This post was edited by twistedphrame on Mar 2, 2010, 5:35 PM)


twistedphrame
Novice

Mar 2, 2010, 5:34 PM

Post #5 of 7 (3871 views)
Re: [twistedphrame] Parsing values and names in HTML Dropdown [In reply to] Can't Post

I may being doing this a tab stupidly but it's the only real way I can think of for breaking up the string I get above:

Code
@term1 = split /\d\d\/\w\w\s-\s/, $terms1; 
@term2 = split /\w*\s*\w+\s+\d\d\d\d/, $terms1;
for(my $i = 0; $i < $#term2; $i++){
print "$i $term2[$i]\n";
}
for(my $i = 0; $i < $#term1; $i++){
print "$i $term1[$i]\n";
}

This gives me
0 10/SP -
1 10/SU -
2 10/SA -
3 10/SB -
4 10/SC -
5 10/SD -
6 10/SE -
7 10/FA -
8 11/SP -
0
1 Spring 2010
2 All Summer 2010
3 Summer A 2010
4 Summer B 2010
5 Summer C 2010
6 Summer D 2010
7 Summer E 2010
8 Fall 2010

but as you can see it's missing the last "Spring 2011" and there is an empty index at 0 of @term1. I'm using split because the number of options in the drop down may change at a later date so I can't just hard code a simple regexp. Is there a good way to do this properly given what I want to do? should I be doing all this with HTML::TreeBuilder?


shawnhcorey
Enthusiast


Mar 2, 2010, 6:12 PM

Post #6 of 7 (3868 views)
Re: [twistedphrame] Parsing values and names in HTML Dropdown [In reply to] Can't Post

Why don't you just extract the data you want directly and not bother with strings and regular expressions?


Code
#!/usr/bin/perl 

use strict;
use warnings;

use HTML::TreeBuilder;

my $content = qq{<tr><td></td></tr>
<tr>
<td>
<table>
<tr>
<td class="label">Term</td>
<td class="right"><select name="ddlTerm" id="ddlTerm">
<option value=""></option>
<option value="10/SP">10/SP - Spring 2010</option>
<option value="10/SU">10/SU - All Summer 2010</option>
<option value="10/SA">10/SA - Summer A 2010</option>
<option value="10/SB">10/SB - Summer B 2010</option>
<option value="10/SC">10/SC - Summer C 2010</option>
<option value="10/SD">10/SD - Summer D 2010</option>
<option value="10/SE">10/SE - Summer E 2010</option>
<option value="10/FA">10/FA - Fall 2010</option>
<option value="11/SP">11/SP - Spring 2011</option>
</select></td>
</tr>
</table>
<hr />
</td>
</tr>
};

my $tree = HTML::TreeBuilder->new();
$tree->parse($content);

my $select = $tree->look_down( '_tag', 'select', name => 'ddlTerm' );

for my $option ( $select->content_list() ){
if( ref( $option ) ){
my $value = $option->attr( 'value' );
my $text = $option->as_text();
print "value = $value\ttext = $text\n";
}
}


__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


twistedphrame
Novice

Mar 2, 2010, 6:39 PM

Post #7 of 7 (3866 views)
Re: [shawnhcorey] Parsing values and names in HTML Dropdown [In reply to] Can't Post

That's actually exactly what I was trying to figure out.

Thanks for the help
~twistedphrame

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives