CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate: Re: [JonathanPool] hex metacharacters for characters below x100: Edit Log



7stud
Enthusiast

Mar 27, 2010, 2:36 AM


Views: 5743
Re: [JonathanPool] hex metacharacters for characters below x100


Quote
the hex metacharacter syntax

I'm not sure where you got that term from. I would call the syntax: \x{ff} a "unicode escape sequence" to distinguish it from a regular 'hexadecimal escape sequence'.


Quote
For example, I can match the "Ā" in a regular expression with \x{0100} or with [\x{0100}]

Not me:

Code
use strict; 
use warnings;
use 5.010;

use utf8;

my @strings = (
'abc',
'Ā',
);

for (@strings) {

if (/\x{62}/) {
say "matched 'a'";
}

if (/\x{0100}/) {
say q{matched 'cap A with umlaut'};
}
}

--output:--
matched 'a'



Quote
I'm finding that the hex metacharacter syntax works in regular expressions as I expect with characters from x100 up, but not with characters from xff down.


Maybe the following will alter your expectations:


Quote
Note that "\x.." (no "{}" and only two hexadecimal digits), "\x{...}",
and "chr(...)" for arguments less than 0x100 (decimal 256) generate an
eight-bit character for backward compatibility with older Perls. For
arguments of 0x100 or more, Unicode characters are always produced. If
you want to force the production of Unicode characters regardless of
the numeric value, use "pack("U", ...)" instead of "\x..", "\x{...}",
or "chr()".

See perluniintro.

ff is 1111 1111 in binary, which is 255 in decimal. The range 0-255 are the 256 numbers than can be represented by one byte(= 8 bits). Usually problems along a boundary like that can be traced to ascii, which represents characters using one byte. 7 bits of that one byte produce the numbers 0-127, which are the 128 ascii characters.

Also, nothing in your post mentions an 'encoding', e.g. utf8. You can compare unicode strings to other unicode strings, but you can't compare unicode strings to regular strings, like utf8 encoded strings (the exception being unicode strings below \x{ff} which perl automatically converts to ascii and therefore can be compared to other ascii strings or utf8 strings). You have to encode a unicode string to compare it to a utf8 encoded string. Unfortunately, perl may automatically encode unicode strings in certain situations, which can be very confusing.


(This post was edited by 7stud on Mar 27, 2010, 3:27 AM)


Edit Log:
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:37 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:40 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:43 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:47 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:48 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:54 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:57 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:59 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 2:59 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 3:00 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 3:03 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 3:27 AM
Post edited by 7stud (Enthusiast) on Mar 27, 2010, 3:27 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives