
johann_p
Novice
Apr 12, 2008, 12:47 AM
Post #1 of 1
(1230 views)
|
|
Unicode in regular expressions?
|
Can't Post
|
|
I wonder what the correct or recommended method is to match unicode characters with a regular expression? For example the long dash has code 0xE2 0x80 0x95. I used the expression /\xe2\x80\x95/ for this but I am unsure if this is correct and portable. Does endedness of the architecture have any influence here -- are Unicode codes always the same independent of endedness? Also, I have seen it is possible to use \x{e280} -- is this always the same as \xe2\x80 ? Finally, is it possible to get this using \N{somename} instead? How can I find out the correct somename and how can I find out which somenames are supported at all?
|