Home: Perl Programming Help: Intermediate:
Regular expressions on text read from UTF-16LE file

New User

Jun 22, 2011, 9:56 AM

Views: 1153
Regular expressions on text read from UTF-16LE file


I have recently been working on a script to localize string files for an iPhone application. These files (automatically generated by a localization tool in xcode) are 16-bit little endian unicode encoded. Since I am using a mac, I have updated my perl version to 5.12 to at least support some of the more modern unicode support features. Nonetheless, I am having significant difficulty in matching regular expressions in text read from these files. As an example, I have attached a tiny localized strings file.

The regular expression that I am trying (but failing) to match is:

$result =~ /(.*?)(\/\*|\")/;

In this file, the expected result would be $2 = /*

I used non-greedy quantification for the preceding text and wanted to terminate on either a quote or a /* (whichever comes first), but I am always getting the quote matching for $2. I tried using the same text, but in ascii, and sure enough, the pattern matches as expected. Any help would be appreciated... I have written a nice state machine parser to automate localization but just can't deal with these UTF16LE regexes.

(This post was edited by mamacken on Jun 22, 2011, 10:21 AM)