Jun 22, 2011, 9:56 AM
Regular expressions on text read from UTF-16LE file
I have recently been working on a script to localize string files for an iPhone application. These files (automatically generated by a localization tool in xcode) are 16-bit little endian unicode encoded. Since I am using a mac, I have updated my perl version to 5.12 to at least support some of the more modern unicode support features. Nonetheless, I am having significant difficulty in matching regular expressions in text read from these files. As an example, I have attached a tiny localized strings file.
The regular expression that I am trying (but failing) to match is:
$result =~ /(.*?)(\/\*|\")/;
In this file, the expected result would be $2 = /*
I used non-greedy quantification for the preceding text and wanted to terminate on either a quote or a /* (whichever comes first), but I am always getting the quote matching for $2. I tried using the same text, but in ascii, and sure enough, the pattern matches as expected. Any help would be appreciated... I have written a nice state machine parser to automate localization but just can't deal with these UTF16LE regexes.
(This post was edited by mamacken on Jun 22, 2011, 10:21 AM)