CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Help with HTML::Element

 



islanderman
Novice

Jan 30, 2014, 2:33 PM

Post #1 of 2 (858 views)
Help with HTML::Element Can't Post

I'm trying to capture embedded links by first using the look_down method and then the right method. Throughout the page the tags look like the following:

<span ..... authors name </span>
<ul>
<li><b>Periodical: <a href=.....>NY Times Article</a></b></li>
</ul>

<span ..... authors name </span>
<ul>
<li><b>Book: <a href=.....>The Firm</a></b></li>
<li><b>Book: <a href=.....>The Client</a></b></li>
</ul>

<span ..... authors name </span>
<ul>
<li><b>Book: <a href=.....>Case Closed</a></b></li>
</ul>


So I'm trying to capture the info in the 'b' tags to associate the authors name with all his/her books/periodicals/editorials/etc.. I get the authors name by doing

my @elements = $tree->look_down( _tag => "span" );

but from here I can't get the right method working. Can someone tell me what the code would be to capture those Titles.

my @rights = $element->right;
print scalar @rights, qq{ siblings\n\n};

that print statement gives me too many siblings.


Zhris
Enthusiast

Feb 4, 2014, 11:30 AM

Post #2 of 2 (763 views)
Re: [islanderman] Help with HTML::Element [In reply to] Can't Post

Hi,

Once you have the list of author nodes, you could iterate through it, fetching the books node to the right of each author node. By calling ->right in list context, as you have done, you will fetch all the nodes to the right which is likely why you are reporting too many siblings. You should call ->right in scalar context.

An example:


Code
#!/usr/bin/perl  
use strict;
use warnings;
use HTML::TreeBuilder;
use Data::Dumper;

$, = "\n";
$\ = "\n\n";

my $tree = HTML::TreeBuilder->new_from_file( \*DATA );

my %publications;

for my $author ( $tree->look_down( _tag => 'span' ) )
{
my $author_txt = $author->as_text;

for my $publication ( $author->right->look_down( _tag => 'b' ) )
{
my ( $type_txt, $title_txt ) = split /:\s/, $publication->as_text;

push @{$publications{$author_txt}{$type_txt}}, $title_txt;
}
}

print Dumper \%publications;

__DATA__
<span>Peter</span>
<ul>
<li><b>Periodical: <a href=.....>NY Times Article</a></b></li>
</ul>

<span>John</span>
<ul>
<li><b>Book: <a href=.....>The Firm</a></b></li>
<li><b>Book: <a href=.....>The Client</a></b></li>
</ul>

<span>James</span>
<ul>
<li><b>Book: <a href=.....>Case Closed</a></b></li>
<li><b>Periodical: <a href=.....>My Periodical</a></b></li>
<li><b>Periodical: <a href=.....>My Periodical b</a></b></li>
<li><b>Magazine: <a href=.....>My Magazine</a></b></li>
</ul>


Output:


Code
$VAR1 = { 
'James' => {
'Periodical' => [
'My Periodical',
'My Periodical b'
],
'Book' => [
'Case Closed'
],
'Magazine' => [
'My Magazine'
]
},
'John' => {
'Book' => [
'The Firm',
'The Client'
]
},
'Peter' => {
'Periodical' => [
'NY Times Article'
]
}
};


Hope this helps,

Chris


(This post was edited by Zhris on Feb 13, 2014, 2:12 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives