
rovf
Veteran
Sep 6, 2011, 6:20 AM
Post #4 of 8
(5762 views)
|
Re: [ulo] encoding trouble: cannot concatenate unicode characters with codepoint between #128 and #256
[In reply to]
|
Can't Post
|
|
Indeed, you are right! Silly that I couldn't see it in the first place. However, I found a hint why these characters are treated differently. From perldoc -f chr : Note that characters from 128 to 255 (inclusive) are by default internally not encoded as UTF-8 for backward compatibility reasons. Of course, this does not explain yet why the problem occurs just with concetanation. I thought first that it might be related to the fact that catenation puts chr() into scalar mode, but this is not the reason. Even if I put it into list context, then take the first element of the list, and catenate it, the bug appears:
print(([chr(0x00DC)]->[0])."\n") BTW, it is not only catenation. Interpolation also doesn't work:
print "@{ [chr(0x00DC)] }\n" Since works, I feel that it is not just a bug in chr, but somehow deeper in Perl, when it comes to manipulate Unicode strings. If you can't find a good explanation in this forum, I suggest that you explain the issue at http://perlmonks.org/, and if they also can't explain it, I would file a bug report....
|