CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Simple Program with multiple issues

 



zerglicious
New User

Feb 3, 2014, 7:14 PM

Post #1 of 13 (931 views)
Simple Program with multiple issues Can't Post

Hello all,

I'm a beginner to Perl and I've been trying to figure out what is going on with my program. It's simple. I'm just trying to reformat a timestamp in the first column of a csv file. I thought I'd try a quick and dirty Perl program/script to get this knocked out, but I'm completely baffled. I want to iterate through each line of the file and update the text in the first column.

So this program "reads" a file correctly (I think) although Perl has a tendency to slurp up a whole file in "line" rather than finding the correct line breaks at least on a Mac. My question here is really my index variable. My index variable is never updating. Which mean my code is always going through the second branch. WTF?


Code
#! /usr/bin/perl -w 

use strict;
use warnings;

use autodie; # die if problem reading or writing a file

my $input_file = "/path/to/my/file.csv";

my $output_file = "/path/to/my/output_file.csv";

local $/ = "\r"; # CR, use "\r\n" for CRLF or "\n" for LF Running from Mac

open(my $input_file_handle, "<", $input_file);
open(my $output_file_handle, '>', $output_file);

my $index = 0;
# Read in line at a time

while (my $line = <$input_file_handle>) {

if($index > 0){
my @line_array = split(/,/, $line);
my $raw_time_stamp = $line_array[0];
my $year = substr $raw_time_stamp, 0, 4;
my $month = substr $raw_time_stamp, 4, 2;
my $day = substr $raw_time_stamp, 6, 2;
my $hour = substr $raw_time_stamp, 9, 2;
my $minute = substr $raw_time_stamp, 11, 2;
my $second = substr $raw_time_stamp, 13, 2;
my $millisecond = substr $raw_time_stamp, 15, 3;
my $formatted_time_stamp = join($year, "-", $month, "-",$day, " ", $hour, ":", $minute, ":", $second, ".", $millisecond);
# print $formatted_time_stamp . "\n";
# if it isn't the first line write the updated timestamp
print $output_file_handle $formatted_time_stamp . "," . $line_array[1], "," . $line_array[2] . "\n";
} else {
# if it is the first line write out the headers
print $output_file_handle $line . "\n";
}

$index++;

}

close($input_file_handle);
close($output_file_handle);



FishMonger
Veteran / Moderator

Feb 3, 2014, 9:44 PM

Post #2 of 13 (920 views)
Re: [zerglicious] Simple Program with multiple issues [In reply to] Can't Post

There's no need to use $index to track the line number of the file. The built-in $. var already handles that for you. But there's also no need to use $. either because the better method is to simply read-in the first line prior to the while loop.

Your first step is to verify the line terminator. I'd use the od command (or any one of a number of hex editors) to do that verification.

Perl's default line terminator is based on the OS you're running. If that differs from what's used in the file, then adjust $/ as needed.

Your join statement isn't doing what you think and is the wrong tool to use in this case. A simple qouble quoted string is all that is needed in that print statement instead of the concantination.


(This post was edited by FishMonger on Feb 3, 2014, 9:45 PM)


BillKSmith
Veteran

Feb 4, 2014, 6:28 AM

Post #3 of 13 (907 views)
Re: [zerglicious] Simple Program with multiple issues [In reply to] Can't Post

Perl's diamond operator (<>) reads everything up to and including the next INPUT_RECORD_SEPARATOR ($/). It then converts the INPUT_RECORD_SEPARATOR to a newline (/n) character. The string that is returned should be the exactly the same on any operating system. The default value of INPUT_RECORD_SEPARATOR is almost always correct unless we are reading a file that was created on a different OS.


On output, print converts a newline character into an OUTPUT_RECORD_SEPARATOR ($/). Again, the default is almost always right.
Good Luck,
Bill


zerglicious
New User

Feb 4, 2014, 7:23 AM

Post #4 of 13 (900 views)
Re: [BillKSmith] Simple Program with multiple issues [In reply to] Can't Post

Thanks all for the suggestions...once I got the line terminator thing sorted out it came together quickly. It turned out the default separator was fine and I didn't need to adjust it. Here is the completed program for other newbies like me.


Code
 
#! /usr/bin/perl -w

use strict;
use warnings;

use autodie; # die if problem reading or writing a file

my $input_file = "/path/to/my/input/file.csv";

my $output_file = "/path/to/my/output/file.csv";

open(my $input_file_handle, "<", $input_file);

open(my $output_file_handle, '>', $output_file);

my $index = 0;

# Read in line at a time

while (my $line = <$input_file_handle>) {

if($index == 0){
print $output_file_handle $line;
} else {
my @line_array = split(/,/, $line);
my $raw_time_stamp = $line_array[0];
my $year = substr($raw_time_stamp, 0, 4);
my $month = substr($raw_time_stamp, 4, 2);
my $day = substr($raw_time_stamp, 6, 2);
my $hour = substr($raw_time_stamp, 9, 2);
my $minute = substr($raw_time_stamp, 11, 2);
my $second = substr $raw_time_stamp, 13, 2;
my $millisecond = substr($raw_time_stamp, 15, 3);
my $formatted_time_stamp = $year . "-" . $month . "-" . $day . " " . $hour . ":" . $minute . ":" . $second . "." . $millisecond;
# if it isn't the first line write the updated timestamp
print $output_file_handle $formatted_time_stamp . "," . $line_array[1], "," . $line_array[2];
}

$index++


}


close($input_file_handle);
close($output_file_handle);



FishMonger
Veteran / Moderator

Feb 4, 2014, 7:31 AM

Post #5 of 13 (899 views)
Re: [BillKSmith] Simple Program with multiple issues [In reply to] Can't Post


Quote
It (the <> diamond operator) then converts the INPUT_RECORD_SEPARATOR to a newline (/n) character.


I don't agree.

The Enter/Return key adds the OS line terminator to the input string. If the string is being piped in, then the OS line terminator is not added/appended. It (the <> diamond operator) does not convert the INPUT_RECORD_SEPARATOR.


Code
#!/usr/bin/perl 

use strict;
use warnings;

$/ = "END";

open my $fh, '>', 'test.txt' or die $!;
my $string = <>;

print $fh $string;
close $fh;

system "od -c test.txt";




Code
c:\test>eol.pl 
testing
END
0000000 t e s t i n g \r \n E N D
0000014



FishMonger
Veteran / Moderator

Feb 4, 2014, 7:40 AM

Post #6 of 13 (898 views)
Re: [zerglicious] Simple Program with multiple issues [In reply to] Can't Post


Code
my $header = <$input_file_handle>; 
print $output_file_handle $header;

while (my $line = <$input_file_handle>) {

my @line_array = split(/,/, $line);
my $raw_time_stamp = $line_array[0];
my $year = substr($raw_time_stamp, 0, 4);
my $month = substr($raw_time_stamp, 4, 2);
my $day = substr($raw_time_stamp, 6, 2);
my $hour = substr($raw_time_stamp, 9, 2);
my $minute = substr($raw_time_stamp, 11, 2);
my $second = substr $raw_time_stamp, 13, 2;
my $millisecond = substr($raw_time_stamp, 15, 3);
my $formatted_time_stamp = $year . "-" . $month . "-" . $day . " " . $hour . ":" . $minute . ":" . $second . "." . $millisecond;

print $output_file_handle join ',', $formatted_time_stamp,
$line_array[1],
$line_array[2];

}


Additional improvements could be made, such as using the Text::CSV module. I'd also cleanup that date parsing. A single regex could extract/parse out each of the timestamp components.


BillKSmith
Veteran

Feb 4, 2014, 9:57 AM

Post #7 of 13 (890 views)
Re: [FishMonger] Simple Program with multiple issues [In reply to] Can't Post

You appear to be right. I do not know where I picked up this error, but it has been a useful model.
Good Luck,
Bill


BillKSmith
Veteran

Feb 4, 2014, 10:07 AM

Post #8 of 13 (887 views)
Re: [FishMonger] Simple Program with multiple issues [In reply to] Can't Post

All the edits are in the first field. I prefer a regular expression.


Code
#! /usr/bin/perl 
use strict;
use warnings;
use 5.10.0;
use autodie; # die if problem reading or writing a file
my $input_file = "/path/to/my/file.csv";
my $output_file = "/path/to/my/output_file.csv";
open( my $input_file_handle, "<", $input_file );
open( my $output_file_handle, '>', $output_file );

print {$output_file_handle} scalar <$input_file_handle>;
while ( my $line = <$input_file_handle> ) {
$line =~ s/^(?<year>\d{4})
(?<month>\d{2})
(?<day>\d{2})
(?<hour>\d{2})
(?<min>\d{2})
(?<sec>\d{2})
(?<msec>\d{3})
/$+{year}-$+{month}-$+{day} $+{hour}:$+{min}:$+{sec}.$+{msec}/x;
print {$output_file_handle} $line;
}
close($input_file_handle);
close($output_file_handle);

Good Luck,
Bill


FishMonger
Veteran / Moderator

Feb 4, 2014, 11:18 AM

Post #9 of 13 (881 views)
Re: [BillKSmith] Simple Program with multiple issues [In reply to] Can't Post

Yes, I agree on using a regex and said so in my post.

Based on the OP's substr statements it appears that there is a space between day and hour, so your regex will probably need a slight tweak.


Code
    $line =~ s/^(?<year>\d{4})  
(?<month>\d{2})
(?<day>\d{2})
\s*
(?<hour>\d{2})
(?<min>\d{2})
(?<sec>\d{2})
(?<msec>\d{3})
/$+{year}-$+{month}-$+{day} $+{hour}:$+{min}:$+{sec}.$+{msec}/x;



Kenosis
User

Feb 4, 2014, 12:06 PM

Post #10 of 13 (877 views)
Re: [FishMonger] Simple Program with multiple issues [In reply to] Can't Post

Given that the 'raw time stamp' is a fixed length, unpack (accommodating for your sharp-eyed space observation) is a good option for getting the values:

Code
my ($year, $month, $day, undef, $hour, $min, $sec, $msec) = unpack 'A4A2A2A1A2A2A2A3', $line;



(This post was edited by Kenosis on Feb 4, 2014, 12:21 PM)


FishMonger
Veteran / Moderator

Feb 4, 2014, 12:15 PM

Post #11 of 13 (871 views)
Re: [Kenosis] Simple Program with multiple issues [In reply to] Can't Post

Yes, I like that unpack version better than the regex, but shouldn't that 4th A2 be A1?


Chris Charley
User

Feb 4, 2014, 12:20 PM

Post #12 of 13 (868 views)
Re: [Kenosis] Simple Program with multiple issues [In reply to] Can't Post

Or using the 'x' format specifier, 'x2' (provided there are 2 spaces or if just 1 space, 'x')


Code
my ($year, $month, $day, $hour, $min, $sec, $msec) = unpack 'A4A2A2x2A2A2A2A3', $line;



Kenosis
User

Feb 4, 2014, 12:21 PM

Post #13 of 13 (865 views)
Re: [FishMonger] Simple Program with multiple issues [In reply to] Can't Post

Yes, you're quite right. Fixed.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives