CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Help with binary file [\r\n] removal

 



acchao
New User

Aug 4, 2008, 2:52 PM

Post #1 of 5 (648 views)
Help with binary file [\r\n] removal Can't Post

Hey All,

This is my first time posting on these forums so please bear with me as I am new to PERL.

I have .Bin files that alternate between 2 bytes of data and 2 bytes of character codes, namely \r\n (hex: 0D 0A).

I've tried two approaches: One is by using s/// to go through and remove all patterns of \r\n:

Code
while(<INFILE>){ 
$writeFile = $_;
$writeFile =~ s/\r\n//g;
print OUTFILE "$writeFile";
}

*im using $writeFile because I don't want to modify my original data file which uses a var. Let me know if you want to see the full code.

The problem i run into, is in cases of 58 0a 0d 0a. It'll remove the first 0a instead of the 0d 0a.

Because there is no guarantee that the data wont be a 0x0d0a value as well, my second approach was to write out to a new file every alternating 2 bytes. This approach also fails in the same manner for the areas where the bytes are 0a 0d 0a.

But the code below skips the first byte in addition to the 0a0d0a error, here's an pseudoexample of what happened

Original data file: 5F 32 0D 0A C5 23 0D 0A 38 23 0D 0A
what I got: C5 23 38 23


Code
$count = 0; 
seek(INFILE,0,0);
#strip file of carriage returns and newlines
while(<INFILE>){
read(INFILE,$word,2,0);
if( $count == 1){
$count = 0;
}
else{
$count = 0;
print OUTFILE "$word";
}
}


Any help would be much appreciated! thanks!


agent
Novice

Aug 4, 2008, 7:36 PM

Post #2 of 5 (641 views)
Re: [acchao] Help with binary file [\r\n] removal [In reply to] Can't Post

uff.. I spent literally few hours (dont laugh) figuring that out,
but was a good practice to me :)
though someone more experience should check if this solution is
proper as this is my first attempt with regular expressions ;)


Code
#!/usr/bin/perl -w 

open(INPUT, "./input.bin");
open(OUTPUT, ">./output.bin");
my $output;
$output .= $_ while(<INPUT>);
$output =~ s/(.{2})\r\n/$1/sg;
print OUTPUT $output;


my test input was:

Code
5F 32 0D 0A    C5 23 0D 0A    38 23 0D 0A     5A 0A 0D 0A 
45 AB 0D 0A 0D 0A 0D 0A 23 12 0D 0A 7C CC 0D 0A
12 12 0D 0A 0D 0D 0D 0A 0A

result:

Code
5F 32    C5 23    38 23    5A 0A    45 AB    0D 0A    23 12    7C CC 
12 12 0D 0D 0A


Initial problem was in assigning lines to $_ in while loop - new iteration
after each new line character and it was not possible to use regex
inside the loop. So we had to join it to one long string.
It took me long time till I figured out that ".{2}" doesn't match \n, but
/s parameter solved that problem. Was fun ;)

regards,


(This post was edited by agent on Aug 4, 2008, 7:43 PM)


acchao
New User

Aug 4, 2008, 8:10 PM

Post #3 of 5 (636 views)
Re: [agent] Help with binary file [\r\n] removal [In reply to] Can't Post

I actually got a c program working to do the 0d 0a removal, but for the sake of understanding what went wrong with my code could you explain to me exactly what you did? Because I dont really understand all of it.

What is the ".="? And could you explain "s/(.{2})\r\n/$1/sg;"?
I mainly don't understand the .{2} and the $1.

Well, I transplanted your code and ran it. It didn't work. It made some changes, but for the most part it didn't remove any 0d 0a's.

are you sure your .bin files are true binary files? You have to use a hex editor to see the values because it should just be gibberish in ASCII. Let me know how it goes.

As for why the 0a 0d 0a fails. I think I figured out the logic. $_ reads in a line, so generally a line would consist of 2 byte data and 2 bytes for the carriage return and new line. But if the data contains a 0a, it considers that as the end of the line.

58 0A 0D 0A, the $_ would only read in 58 0A. However, this doesn't explain how that line would result in a 58 0D 0A print out. Anyone have any insights?

Like I said on my last post, when I ignore what the values of the bytes are and I just use a counter to print out the first two of afour byte sequence I still see the same error.
58 0A 0D 0A -> 58 0D 0A

PS. and I wouldn't laugh. I've spent 3,4 days trying to figure out this problem lol.


(This post was edited by acchao on Aug 4, 2008, 8:11 PM)


agent
Novice

Aug 4, 2008, 8:38 PM

Post #4 of 5 (632 views)
Re: [acchao] Help with binary file [\r\n] removal [In reply to] Can't Post

hi
hmm that's weird but on my system it works fine, i'd tested it again
either with mine and your input data and it seems to generate proper
results. I've used the hex editor to generate the files ;)

".=" concatenate value of left operand with right operand, that means
if you have $a scalar with string value "ABBA", and you make statement
like $a .= " ROCKS", you'll get "ABBA ROCKS" in $a ;)

"s/(.{2})\r\n/$1/sg"

(.{2}) matches any two characters (except newline)
$1 represents result of match in first parentheses
so (.{2})\r\n will give you match like "X X 0x0D 0x0A" where X is any value except newline.
/g means global matching as you already know
/s forces to treat the string as single line (it will allow "." to match newline)

all this means that we try to match "X X 0x0D 0x 0A" string and put only the "X X" characters on it's place globally.

to see how while loop read lines try:

while (<INPUT>) {
print OUTPUT $_;
print OUTPUT "0000000000000000000";
}

And look into the output file you'll get each line separated by "00000000000" string;

Anyway i have no idea why this code doesn't work for you. Can you
post the output you get from my and your input files? Then maybe we'll
find out what's going on :)

regards


(This post was edited by agent on Aug 4, 2008, 8:41 PM)


agent
Novice

Aug 4, 2008, 10:06 PM

Post #5 of 5 (628 views)
Re: [agent] Help with binary file [\r\n] removal [In reply to] Can't Post

hi
thinking about this problem once again I've made a simple test:


Code
#!/usr/bin/perl -w 

open(INPUT, ">./input.bin");
open(OUTPUT, ">./output.bin");
binmode INPUT;
binmode OUTPUT;
my $i = 0;
while($i < 255) {
$i++;
my $a = int(rand(255));
my $b = int(rand(255));
printf INPUT "%c%c",$a,$b;
printf OUTPUT "%c%c",$a,$b;
print INPUT "\r\n";
}


script above generates input file and correct output file.
script below is the previous parsing script


Code
#!/usr/bin/perl -w 

open(INPUT, "./input.bin");
open(OUTPUT, ">./output2.bin");
binmode INPUT;
binmode OUTPUT;
my $output;
$output .= $_ while(<INPUT>);
$output =~ s/(.{2})\r\n/$1/sg;
print OUTPUT $output;



Code
 ~$ diff output.bin output2.bin


Equals on my system. Try to add binmode function statements maybe it
will help. Good night now :)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives