Mar 28, 2010, 4:29 PM
I can assure you it's not. I copy and pasted the UTF-8 character C3 84 into my string. That character's Unicode code point is U+00C4, and it's official name is "LATIN CAPITAL LETTER A WITH DIAERESIS". So, I'm not sure what you are talking about.
Re: [JonathanPool] hex metacharacters for characters below x100
If on your system Perl converts characters into UTF-8, then I understand it finds no match. But what makes Perl do that? I believe I haven't seen that behavior.
The regular expression compiler produces polymorphic opcodes. That
is, the pattern adapts to the data and automatically switches to
the Unicode character scheme when presented with data that is
internally encoded in UTF-8 -- or instead uses a traditional byte
scheme when presented with byte data.
(This post was edited by 7stud on Mar 28, 2010, 4:35 PM)