Regex, ShiftJIS and locale

Feb 12, 2007, 7:36 PM

My script parses pathnames, some of which contain Shift-JIS characters. (I'm working in native Windows 2000 Japanese, with Active Perl 5.6.1.) I attach an example of the input file to be parsed. We parse with:


With ASCII characters, this parses the pathnames with no problems. In fact, it even works fine with Shift-JIS chars in the middle of words in the path (e.g., the first five lines of the attached file).

The problem occurs when there are Shift-JIS chars at the beginning or end of the word (e.g., all of the other lines in the attached file). I believe that using \j with

use ShiftJIS::Regexp qw(:all);

would fix the problem, but I've tried replacing \w and \b with \j
with no luck. Am I using ShiftJIS:: correctly?

It also occurs to me that I could use

use POSIX qw(locale_h);
setlocale(LC_CTYPE, "ja_JA.Shift_JIS");

as I see in perllocale.pod, so that \w behaves appropriately for the Japanese locale. However, I've tried that with no success.

I would be grateful for any ideas you might have.

Many thanks!
Attachments: BREWAPIReferencetoc.txt (3.53 KB)