
rovf
Veteran
Nov 15, 2012, 12:47 AM
Post #4 of 5
(4880 views)
|
Re: [gj] Why Encode doesn't work
[In reply to]
|
Can't Post
|
|
Well, but then, this string denotes (HTML-) entities (in this case a so-called "numeric character reference"). But you are using Encode/Decode, which is for translating character sets, so you are applying these function to data they are not designed to work for. 1072 for instance is decimal value of the Unicode code point for the cyrillic 'a'. I'm not a specialist in dealing with Unicode, but I feel that you first need to convert the entity string denoting Unicode code points, into UTF-8, and then you can use a second conversion step to translate this into your target character set. Have a look at, say, http://en.wikipedia.org/wiki/A_%28Cyrillic%29. You can see that the code point 1042 corresponds to 0xD0B0 in UTF-8, which then corresponds to 224 in Windows-1251. Hence, I would google for a module which creates an UTF-8 character out of a Unicode codepoint... Ronald
|