
whitejm12
New User
Feb 12, 2007, 7:36 PM
Post #1 of 1
(5472 views)
|
Regex, ShiftJIS and locale
|
Can't Post
|
|
My script parses pathnames, some of which contain Shift-JIS characters. (I'm working in native Windows 2000 Japanese, with Active Perl 5.6.1.) I attach an example of the input file to be parsed. We parse with: if($line=~m/^(\w.+)\b\/\b(\w.+)\b\/\b(\w.+)\b\/\b(\w.+)\b\/\b(\w.+?\.htm)$/) With ASCII characters, this parses the pathnames with no problems. In fact, it even works fine with Shift-JIS chars in the middle of words in the path (e.g., the first five lines of the attached file). The problem occurs when there are Shift-JIS chars at the beginning or end of the word (e.g., all of the other lines in the attached file). I believe that using \j with use ShiftJIS::Regexp qw(:all); would fix the problem, but I've tried replacing \w and \b with \j with no luck. Am I using ShiftJIS:: correctly? It also occurs to me that I could use use POSIX qw(locale_h); setlocale(LC_CTYPE, "ja_JA.Shift_JIS"); as I see in perllocale.pod, so that \w behaves appropriately for the Japanese locale. However, I've tried that with no success. I would be grateful for any ideas you might have. Many thanks!
|