CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Why Encode doesn't work

 



gj
New User

Nov 13, 2012, 2:33 AM

Post #1 of 5 (2397 views)
Why Encode doesn't work Can't Post

Hello, maybe someone have an idea what is wrong - why Encode doent't work



Code
use Encode; 
$dddd = decode('cp-1251', 'фыва');

$dddd returns in file the same
фыва

also

Code
use Encode; 
Encode::from_to('фыва', cp-1251, "utf8");

doesn't transcode the string.

Popular web based transcoder successfully recognizes this string and transcodes it to kirilica.
Any recommendations how to convert this cp-1251 string to utf-8?


rovf
Veteran

Nov 14, 2012, 2:05 AM

Post #2 of 5 (2385 views)
Re: [gj] Why Encode doesn't work [In reply to] Can't Post

The documentation of decode says:


Quote
$string = decode(ENCODING, OCTETS[, CHECK])

This function returns the string that results from decoding the scalar value OCTETS, assumed to be a sequence of octets in ENCODING, into Perl's internal form. The returns the resulting string.


First, I don't see from your example that the string you are passing to decode is a sequence of octets which you got from some earlier call to encode.

Secondly, the string contains only digits and a few special characters, and they have the same encoding in most character encoding schemes (for instance, they have the same encoding in cp-1251 and ASCII). I wonder what output you have expected instead?


gj
New User

Nov 14, 2012, 8:48 AM

Post #3 of 5 (2379 views)
Re: [rovf] Why Encode doesn't work [In reply to] Can't Post

Thanks a lot for replay!

Well, I get this string from web form, which POSTS this into perl script when user inputs in Cyrillica. When I use this page http://2cyr.com/decode/ (last option in autodetect drop down), then from this string of characters

фыва

I can successful decode to readable Cyrillic characters

фыва

So this is the result what I expected.
If the problem is that this input string needs to be in those octets, then my next question is how do I convert this string into octets in Perl if it is possible?


rovf
Veteran

Nov 15, 2012, 12:47 AM

Post #4 of 5 (2376 views)
Re: [gj] Why Encode doesn't work [In reply to] Can't Post

Well, but then, this string denotes (HTML-) entities (in this case a so-called "numeric character reference"). But you are using Encode/Decode, which is for translating character sets, so you are applying these function to data they are not designed to work for.

1072 for instance is decimal value of the Unicode code point for the cyrillic 'a'. I'm not a specialist in dealing with Unicode, but I feel that you first need to convert the entity string denoting Unicode code points, into UTF-8, and then you can use a second conversion step to translate this into your target character set.

Have a look at, say, http://en.wikipedia.org/wiki/A_%28Cyrillic%29. You can see that the code point 1042 corresponds to 0xD0B0 in UTF-8, which then corresponds to 224 in Windows-1251.

Hence, I would google for a module which creates an UTF-8 character out of a Unicode codepoint...

Ronald


gj
New User

Nov 15, 2012, 3:05 AM

Post #5 of 5 (2371 views)
Re: [rovf] Why Encode doesn't work [In reply to] Can't Post

Ok, thanks for clarification!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives