Home: Perl Programming Help: Regular Expressions:
Finding unicode.



ferulebezel
New User

Dec 17, 2011, 8:13 PM


Views: 15384
Finding unicode.

My googleing has failed me. I can't find out how to search for unicode characters, especially those above 255.

From what I've been able to find


Code
s/\x{e9}/é/g;


or


Code
s/\x{2014}/—/g;


should work

but when I print $_ the substitutions haven't happened.

Clearly, I'm doing something wrong. What is it?


rovf
Veteran

Dec 21, 2011, 3:13 AM


Views: 15193
Re: [ferulebezel] Finding unicode.

How did you verify, that your string really contains the correct unicode character you were looking for?

BTW, your goal seems to be to convert the unicode characters to the corresponding HTML entities. Maybe it is easier to use HTML::Entities, which is a standard module in Perl.


ferulebezel
New User

Dec 21, 2011, 10:40 AM


Views: 15170
Re: [rovf] Finding unicode.

I used :ascii in Vim to get the value.

I tried using HTML::Entities and had some problems with it. It doesn't distinguish between characters in markup and characters in the text.


rovf
Veteran

Dec 21, 2011, 11:46 AM


Views: 15168
Re: [ferulebezel] Finding unicode.

I would dump the string you have in Perl, as hexadecimal value, just to make sure you have the right data. Maybe the problem already occurs when reading the data into your program...


BillKSmith
Veteran

Dec 21, 2011, 1:04 PM


Views: 15166
Re: [ferulebezel] Finding unicode.

I think you mean \x{}instead of \X{}. Refer perldoc perlre
Good Luck,
Bill


rickb
New User

Dec 22, 2011, 6:22 AM


Views: 15131
Re: [ferulebezel] Finding unicode.

I have run into this same issue and am stumped. I use BBEdit with Lion and find that using \x{} in a search/replace simply doesn't work. I also tried with TextMate; same result.

I can do a manual search/replace in BBEdit, or use a TextFactory and it works perfect. This leads me to believe that it is a Perl issue since BBEdit implements PCRE internally.

Any tips?

In Reply To