CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
regex to strip control chars

 



perlster
stranger

Sep 5, 2001, 12:18 PM

Post #1 of 10 (823 views)
regex to strip control chars Can't Post

Hi...

I have a recurring DOS file that I need to Unix-ify - i.e. strip the newline/carriage return and replace with newline.

I've tried:

s/\015$/\012/e;

s/\r\n$/\n/e;

and a bunch of others, but NO JOY!

If I load the file in 'Joe', I can see the ^M at the end of each line so I know the suckers are there. The above regexs that I've tried are from a Google search. What am I missing here? TIA....

--
duke
Calgary, Alberta, Canada


mhx
Enthusiast / Moderator

Sep 5, 2001, 1:43 PM

Post #2 of 10 (822 views)
Re: regex to strip control chars [In reply to] Can't Post

If you're running the script on a Windows box, be sure your filehandles are in binary mode. If you just want a pipe-through-script to unixify text file, use something like:

Code
#!perl -w 
binmode STDOUT;
while( <> ) {
s/\r(?=\n$)//;
print;
}

Hope this helps.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



perlster
stranger

Sep 5, 2001, 4:53 PM

Post #3 of 10 (820 views)
Re: regex to strip control chars [In reply to] Can't Post

Hello Marcus...

The recurring DOS file I spoke of is that POP3 client file that gets cleaned up by the scriplet:

#!/usr/bin/perl -wpi.old

s/^(?=From e-pop)/\n/ if $.-1;
s/e-pop\@localhost/'mailer-daemon '.localtime/e;

you so kindly suggested in a previous post. It works well! So I could simply include

s/\r(?=\n$)//;

as you suggest, to the end of that scriplet?

My setup is a bit confusing, as Perl runs in Windows, but is in this case massaging a file in a Cygwin directory. The file will eventually be used by Mutt - a totally Unix MUA. I mention this because in view of the fact that the above scriplet works so well, would I still have to worry about "binmode STDOUT;"? Thanks


--
duke
Calgary, Alberta, Canada


perlster
stranger

Sep 5, 2001, 9:18 PM

Post #4 of 10 (818 views)
Re: regex to strip control chars [In reply to] Can't Post

Marcus....

I tried adding 's/\r(?=\n$)//;' as the last line of my (your) scriplet but it didn't work ;( Actually it worked too well -- the whole works is now on one line.

I also tried 's/\r(?=\n$)/\n/;' but nothing got substituted. I also tried 'tr/\r//d;' -- no joy!

What the hell is wrong? Actually the Windows POP3 client that is initially writing this file can be run with an option to write in Unix format (I just clued in to it). So, prior to running your scriplet the file is good with respect to the NL thing.

However *after* piping the file through the scriplet I have ^M after each line. So either the scriplet is introducing these ^M, or it's the DOS/Windows version of Perl that is doing it. Is there a way to force Perl to write in Unix format in Windows? I did include 'binmode STDOUT' at the top of the scriplet, but it didn't make any difference. Any ideas? TIA....

--
duke
Calgary, Alberta, Canada


mhx
Enthusiast / Moderator

Sep 5, 2001, 10:09 PM

Post #5 of 10 (816 views)
Re: regex to strip control chars [In reply to] Can't Post

Oh well! Life can be so hard if you have to use Windows. Crazy

The problem is definitely that the filehandles are not set to binary mode in the script. I don't deal with inplace editing too often, but I think I've found an acceptable way of solving the problem. I'm really not sure if it's the best way, but perhaps it makes you happy. Wink

Code
#!/usr/bin/perl -pi.old 
$.-1 ? s/^(?=From e-pop)/\n/ : binmode ARGVOUT;
s/e-pop\@localhost/'mailer-daemon '.localtime/e;
s/\r(?=\n$)//;

This runs absolutely smooth on my NT box (ActivePerl 5.6.0) and creates wonderful Unixish files. I had to remove the -w flag because it was complaining about ARGVOUT being used only once. DON'T normally do this unless you know 100% your script works. In the second line of the script I'm checking if I'm on the first line of the current file or not. If it's the first line, the script sets the output file handle to binary mode. If it isn't the first line, it inserts a newline before that line if it starts with 'From e-pop'.

Hope this helps.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



perlster
stranger

Sep 6, 2001, 5:01 AM

Post #6 of 10 (815 views)
Re: regex to strip control chars [In reply to] Can't Post

hello Marcus...

You're right about Windows! I'm a FreeBSD newbie as well, so I've seen the light!! However various circumstances force me to HAVE to use win9x for Internet connectivity, including email. So I'm using the next-best platform - CYGWIN.

Anyway, I hate being a pain-in-the-butt, but here's the results of the "revised" scriplet:

$ ./clean-pop duke.mbx
./clean-pop: line 8: syntax error near unexpected token `s/^(?'
./clean-pop: line 8: `$.-1 ? s/^(?=From e-pop)/\n/ : binmode ARGVOUT;'

$ perl -v

This is perl, version 5.003_93

Copyright 1987-1997, Larry Wall

OS/2 port Copyright (c) 1990, 1991, Raymond Chen, Kai Uwe Rommel
Version 5 port Copyright (c) 1994-1997, Andreas Kaiser, Ilya Zakharevich

This version of Perl was the least bloated one that I could find that would fit on my HDD, and still run well. I don't think that it should make a difference though.

Any ideas? Thanks for all your input -- I sure appreciate it!! BTW, where in the Perl docs would I read up on this 'binmode' stuff and filehandles? I want to be learning something from all this, cuz it's NOT my intention to have knowledgable folks like you doing my work while I sit back and wait. I'm in for the long haul! Thanks again!




--
duke
Calgary, Alberta, Canada


perlster
stranger

Sep 6, 2001, 5:25 AM

Post #7 of 10 (814 views)
Re: regex to strip control chars [In reply to] Can't Post

YOU'RE THE MAN.....

Scratch my last post, Marcus, cuz when I modified my script with your revised version, it was too early in the AM, and I still had my head up you-know-what! ;^)

The "revised" scriplet works likes a charm. My CYGWIN Mutt is happy; Joe, my default Mutt editor is happy, and when I FTP my folders to my FreeBSD box, FreeBSD wont choke.

I still want to read up on this 'binmode' stuff though ;) I can't *believe* Perl!! I don't know why I ever wasted my time with PHP. Thanks again, bud.... L8r

--
duke
Calgary, Alberta, Canada


mhx
Enthusiast / Moderator

Sep 6, 2001, 6:16 AM

Post #8 of 10 (813 views)
Re: regex to strip control chars [In reply to] Can't Post


In Reply To

Code
This is perl, version 5.003_93


Anyway, um, perhaps you should think about upgrading to a more recent version of perl... Wink

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"



perlster
stranger

Sep 6, 2001, 11:25 AM

Post #9 of 10 (812 views)
Re: regex to strip control chars [In reply to] Can't Post

Marcus....

You're right of course about upgrading. As well, I'm not too thrilled about the OS/2 thing. However, I do have the most recent version of Perl on my FreeBSD box. ;) So I haven't been too worried about the win9x side as Perl has been working super. As I said in my subsequent post this AM, your modified scriplet works just super. Thanks again! I'll do a search for this 'binmode blah' thingie! I'm learnng.... ;) BTW, are you on 'comp.lang.perl.misc' at all? Was wondering how "Godzilla" was doing? ;) L8r....

--
duke
Calgary, Alberta, Canada


mhx
Enthusiast / Moderator

Sep 6, 2001, 2:26 PM

Post #10 of 10 (809 views)
Re: regex to strip control chars [In reply to] Can't Post


In Reply To
BTW, are you on 'comp.lang.perl.misc' at all?

No. (I can't be everywhere... Wink)
From time to time, I post to beginners@perl.org and perlunity.de (a german Perl forum), and I listen to many of the other @perl.org lists and (if I have the time...) perlmonks. But this forum is definitely where I spend most of my time.

-- Marcus


Code
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= 
($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives