
thomas.hedden
New User
Feb 2, 2010, 8:38 AM
Post #1 of 2
(2668 views)
|
|
Regular expression / regex substition on Unicode text
|
Can't Post
|
|
I have a large file encoded in Unicode that I need to convert to CSV. In general, I know how to do this by regular expression substitutions using sed or Perl, but one problem I am having is that I need to put a quotation mark at the end of each line to protect the last field. The usual regex substitution ... s/$/"/ ... works fine for 7-bit ASCII text, but when I run this on my Unicode text file, the double quotation mark appears at the BEGINNING of the FOLLOWING line, not at the end of the line on which it's supposed to appear. The file came from a Windows system, but piping through dos2unix doesn't seem to make any difference. I've tried the "use Encode;" pragma with several different encodings, but I get the same result. Perhaps I'm doing something wrong. Does anyone know of a special library function intended for this purpose, a Perl pragma, etc., that would accomplish this easily? This should be a trivial problem. Thanks in advance for any suggestions. Tom
|