CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
HTML charsets

 



m_ayesha
New User

Aug 18, 2002, 8:24 PM

Post #1 of 7 (1072 views)
HTML charsets Can't Post

I'm retrieving an HTML page using wget. This page contains other language fonts encoded as windows-x charset characters.

Is there any way I can convert these character to unicode in a perl script, after downloading the html page?

Thanks


thebitch
User

Aug 26, 2002, 11:41 PM

Post #2 of 7 (1063 views)
Re: [m_ayesha] HTML charsets [In reply to] Can't Post

I doubt it ( it could be done, i wouldn't bother).
You're better off looking on freshmeat.net
to find something to do it for you.

a good editor will let you do the conversion seamlessly,
like jEdit.org


RussianSpy
Novice

Sep 18, 2002, 12:50 PM

Post #3 of 7 (1021 views)
Re: [m_ayesha] HTML charsets [In reply to] Can't Post

i don't know either, and would VERY much like to know myself.

but if you are to make something like this, maybe JScript here will help (ya know, to look how its done):

http://www.macchiato.com/unicode/convert.html im so new to perl i can't do it myself.



UniPad (www.unipad.org) is another popular and pretty good Unicode compatable text editor.
_________________________________________________
[ noobie alert ] WinXP :: Xitami Webserver :: Active Perl 5.8.6


Jasmine
Administrator

Sep 18, 2002, 5:30 PM

Post #4 of 7 (1019 views)
Re: [m_ayesha] HTML charsets [In reply to] Can't Post

You might want to check out [url=http://search.cpan.org/author/GAAS/Unicode-String-2.06/]Unicode::String ([url=http://search.cpan.org/author/GAAS/Unicode-String-2.06/String.pm]docs) or [url=http://search.cpan.org/author/AMICHAUER/Unicode-Lite-0.12/]Unicode::Lite ([url=http://search.cpan.org/author/AMICHAUER/Unicode-Lite-0.12/5.005/Lite.pm]docs).


thebitch
User

Sep 18, 2002, 9:15 PM

Post #5 of 7 (1016 views)
Re: [Jasmine] HTML charsets [In reply to] Can't Post

nice looking out Wink
You wouldn't happen to know of a module
to detect charsets? Tongue


Jasmine
Administrator

Sep 19, 2002, 9:43 AM

Post #6 of 7 (1010 views)
Re: [thebitch] HTML charsets [In reply to] Can't Post

Maybe [url=http://search.cpan.org/author/JHI/perl-5.8.0/]Unicode::UCD, which comes standard in Perl5.8.0 ([url=http://search.cpan.org/author/JHI/perl-5.8.0/lib/Unicode/UCD.pm]docs), [url=http://search.cpan.org/author/GAAS/Unicode-String-2.06/]Unicode::CharName ([url=http://search.cpan.org/author/GAAS/Unicode-String-2.06/lib/Unicode/CharName.pm]docs), or [url=http://search.cpan.org/author/MPIOTR/Lingua-Ident-1.4/]Lingua::Ident ([url=http://search.cpan.org/author/MPIOTR/Lingua-Ident-1.4/Ident.pm]docs).


RussianSpy
Novice

Sep 19, 2002, 10:10 AM

Post #7 of 7 (1007 views)
Re: [thebitch] HTML charsets [In reply to] Can't Post

Jasmine, wow! thats really cool stuff. thanks!

i tought if there was an ez (or sure) way to detect charset if its not declared in META tag, browser makers would use it. in this case, i guess, "charset" in meta tag would get obsolete or not there in a first place. but right now both IE and NS just display page w/ last encoding you've selected manually if no charset specified on a page. default charset (according to HTTP 1.1) is ISO-8859-1, but browsers do their own thing - which in fact is very usefull most of the times for a user...

if page was ment to be in ISO-8859-1 (or just english) any needed international would be entered in html file as escape sequence.
_________________________________________________
[ noobie alert ] WinXP :: Xitami Webserver :: Active Perl 5.8.6

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives