Nov 15, 2012, 12:47 AM
Post #4 of 5
Well, but then, this string denotes (HTML-) entities (in this case a so-called "numeric character reference"). But you are using Encode/Decode, which is for translating character sets, so you are applying these function to data they are not designed to work for.
Re: [gj] Why Encode doesn't work
[In reply to]
1072 for instance is decimal value of the Unicode code point for the cyrillic 'a'. I'm not a specialist in dealing with Unicode, but I feel that you first need to convert the entity string denoting Unicode code points, into UTF-8, and then you can use a second conversion step to translate this into your target character set.
Have a look at, say, http://en.wikipedia.org/wiki/A_%28Cyrillic%29. You can see that the code point 1042 corresponds to 0xD0B0 in UTF-8, which then corresponds to 224 in Windows-1251.
Hence, I would google for a module which creates an UTF-8 character out of a Unicode codepoint...