CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner: Re: [IsabelleFr] Paragraph extraction: Edit Log



Kenosis
User

Feb 23, 2013, 8:08 PM


Views: 805
Re: [IsabelleFr] Paragraph extraction

You can't parse [X]HTML with regex. Or, at least, you shouldn't try--especially when you can use a module, like Mojo::DOM, that's well designed for the task:


Code
use strict; 
use warnings;
use Mojo::DOM;

my $html = <<END;
<p>foo</p> <p>
bar</p>
<p>
foo bar
</p><p> bar
foo
bar

</p>
END

my $dom = Mojo::DOM->new($html);

for my $paragraph ( $dom->find('p')->each ) {
print $paragraph->text, "\n";
}


Output:

Quote
foo
bar
foo bar
bar foo bar


If you don't want smart whitespace trimming (notice that the text of the paragraphs has been reformatted), you can do the following:


Code
print $paragraph->text(0)



(This post was edited by Kenosis on Feb 23, 2013, 9:49 PM)


Edit Log:
Post edited by Kenosis (User) on Feb 23, 2013, 8:15 PM
Post edited by Kenosis (User) on Feb 23, 2013, 9:17 PM
Post edited by Kenosis (User) on Feb 23, 2013, 9:20 PM
Post edited by Kenosis (User) on Feb 23, 2013, 9:49 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives