CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
XML::Twig doctype and entity handling

 



Zed
New User

Sep 6, 2008, 11:08 AM

Post #1 of 1 (1006 views)
XML::Twig doctype and entity handling Can't Post

I'm writing a program that needs to extract a clump of XML metadata stored inside of a noncompliant HTML file and then perform a number of operations on that metadata. (Specifically, for those curious, this is part of a Mobipocket .prc to IPDF .epub ebook converter.)

The HTML file in question has no doctype declaration, and XHTML entities may be found in the metadata portion. In particular, © is the first entity that XML::Parser will choke on in my current test data.

Could someone please provide me with an example of how to get XML::Twig to recognize XHTML entities? (Or even just © to get me started?) I came up with a workaround involving slurping the input file and using a regular expression to split the metadata out into a temporary file, then run tidy on it, but it's something of an evil hack, given that I have to just read the results of that back into XML::Twig anyway.

My last attempt at getting XML::Twig to read this looks like this:


Code
    $mobihtmltwig = XML::Twig->new( 
load_DTD => 1,
twig_roots => { 'metadata' => 1 },
twig_handlers => { 'metadata' => \&twig_cut_metadata },
output_encoding => 'utf8',
pretty_print => 'indented',
twig_print_outside_roots => 'HTML'
);

$mobihtmltwig->set_doctype(
'package',
"http://openebook.org/dtds/oeb-1.2/oebpkg12.dtd",
"+//ISBN 0-9673008-1-9//DTD OEB 1.2 Package//EN");

$mobihtmltwig->entity_list->add_new_ent(copy => "©");

print $mobihtmltwig->entity_names,"\n";

$mobihtmltwig->parsefile($mobihtmlfile);


It dies at the parsefile command with:


Code
undefined entity at line 1, column 413, byte 413 at /usr/lib/perl5/XML/Parser.pm line 187


Byte 413 is the first ©. This is despite 'copy' being present in the entity list.

Thanks for any help.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives