CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I need a program that...:
translate csv format

 



mgpg
Novice

Feb 15, 2008, 3:26 PM

Post #1 of 7 (4728 views)
translate csv format Can't Post

I hope somecan help me to convert the following csv to the desired output.

Input
  • Each Row has the following columns - Component, Summary, Data, Priority.
  • Each row may span multiple lines, row is delimited by a ^M
  • All strings. Priority is P1, P2, etc.
  • Data is kind of weird. It is in quotes and has enumerated steps (please see sample below). It may contain commas! But not quotes.
  • A comma is a delimted by comma (see exception above for Data)
  • I cannot change the csv - generated by some tool.


Sample Input

Component1,Summary2 ,"1.Step1
2. Step2 details
3. Step3 details

",P2^M

Component2,Summary1,"1. User1 does this
2. User2 does that
3. User1 does something else

",P2^M

Component2,Summary2 ,"1.Step1
2. Step2 details
3. Step3 details

",P1^M

Desired Output

foreach component {

print priority

foreach step in Data { print <step-number>## step }

print summary\n

}

as well as

foreach priority {

print component

foreach step in Data { print <step-number>## step }

print summary\n

}



I hope someone can help me here. Thanks a bunch :)

(This post was edited by mgpg on Feb 15, 2008, 3:27 PM)


KevinR
Veteran


Feb 15, 2008, 3:47 PM

Post #2 of 7 (4724 views)
Re: [mgpg] translate csv format [In reply to] Can't Post

Help, if I can. Do it for you, no. What have you tried so far to solve this porblem?
-------------------------------------------------


mgpg
Novice

Feb 15, 2008, 9:01 PM

Post #3 of 7 (4722 views)
Re: [KevinR] translate csv format [In reply to] Can't Post

This is what I have so far (sorry don't know how to preserve indentation):

# open conf file
open (CONF_FILE, "$confFile") ||
die "Failed to open confFile=$confFile\n";

# read file contents
undef $/;
@chunks = split(/\cM\cJ/, <CONF_FILE>);

foreach $rec (@chunks) {
chomp $rec;
@kk = split (/\cJ/, $rec);
$reci = join '', @kk; #remove embedded newlines within record
($com, $subcat, $cat, $summary, $steps, $expRes, $pri) = split (/,/, $reci);
my $rec = {};
$rec{com} = $com;
$rec{subcat} = $subcat;
$rec{cat} = $cat;
$rec{summary} = $summary;
$rec{steps} = $steps;
$rec{expRes} = $expRes;
$rec{pri} = $pri;

push @{ $components{$com}{$pri} }, $rec; ## NOTE1

#print "$rec\n";
#print "$com ;; $subcat;; $cat;; $summary;;\n$steps, $expRes, $pri\n";
#last;
}


foreach $com ( keys %components ) {
foreach $pri (sort keys %{ $components{$test} } ) {
@listofrecords = @{ $components{$test}{$pri} } ;
for $rec (@listofrecords) {

# do something.... $rec->{com} and other fields are always the same...the last one in input file!
}
}
}



THe problem here is that the $rec that I'm getting is always the same (the last one that was stored). See NOTE1 above. Seems like the same piece of memory is being overwritten for each record (I thought my $rec = {} should allocate memory). $component{}{} is a list of records for a component/pri.

How do I tell the editor not to remove indentation!


KevinR
Veteran


Feb 15, 2008, 9:33 PM

Post #4 of 7 (4720 views)
Re: [mgpg] translate csv format [In reply to] Can't Post

to preserve formatting. Replace '{' with '[' and '}' with ']'.

{code}
your code here
{/code}

The biggest problem with your code is the use of $rec to define two seperate pieces of data in the same scope:

foreach $rec (@chunks) {
chomp $rec;
@kk = split (/\cJ/, $rec);
$reci = join '', @kk; #remove embedded newlines within record
($com, $subcat, $cat, $summary, $steps, $expRes, $pri) = split (/,/, $reci);
my $rec = {};

the second use of $rec inside the same loop always kills any value $rec had at the beginning of the "foreach" loop. Do this, replace this section of code:



Code
my $rec = {}; 
$rec{com} = $com;
$rec{subcat} = $subcat;
$rec{cat} = $cat;
$rec{summary} = $summary;
$rec{steps} = $steps;
$rec{expRes} = $expRes;
$rec{pri} = $pri;

push @{ $components{$com}{$pri} }, $rec; ## NOTE1


replace with this:


Code
my %rec = (); 
$rec{com} = $com;
$rec{subcat} = $subcat;
$rec{cat} = $cat;
$rec{summary} = $summary;
$rec{steps} = $steps;
$rec{expRes} = $expRes;
$rec{pri} = $pri;

push @{ $components{$com}{$pri} }, \%rec; ## NOTE1


Hopefully that will clear up the problem.

In the future make sure to use "strict" and "warnings" in all your perl programs:

use strict;
use warnings;

they will really help you write better perl code and would have caught this error right away.
-------------------------------------------------


mgpg
Novice

Feb 16, 2008, 9:02 AM

Post #5 of 7 (4718 views)
Re: [KevinR] translate csv format [In reply to] Can't Post


Code
  

#!/usr/bin/perl
use warnings;
use strict;
use Getopt::Long;

my ($confFile, $debug, $usage);
GetOptions( "conf=s" => \$confFile ,
"debug" => \$debug);
$usage = "--conf=<input-conf-file> --debug";

my %components;

# open conf file
open (CONF_FILE, "$confFile") ||
die "Failed to open confFile=$confFile\n";

# read file contents
sub readFile {
my @records; # list of all records in file
my $record; # single record in @records

undef $/;
@records = split(/\cM\cJ/, <CONF_FILE>);

foreach $record (@records) {
my ($com, $subcat, $cat, $summary, $steps, $expRes, $pri, @temp);
my %entry;

chomp $record;
@temp = split (/\cJ/, $record); #ger rid of embedded newline
$record = join '', @temp; #setup record again after removal of newlines

print "Read: $record\n" if $debug;

# parse record into individual fields and put in a hash
($com, $subcat, $cat, $summary, $steps, $expRes, $pri) = split (/,/, $record);
$entry{com} = $com;
$entry{subcat} = $subcat;
$entry{cat} = $cat;
$entry{summary} = $summary;
$entry{steps} = $steps;
$entry{expRes} = $expRes;
$entry{pri} = $pri;

print "\tParsed fields: $com ;; $subcat;; $cat;; ",
"$summary;; $steps;; $expRes;; $pri\n" if $debug;
print "\tParsed fields: $entry{com} ;; $entry{subcat};; $cat;; ",
"$summary;; $steps;; $expRes;; $pri\n" if $debug;

# push this hash 'entry' into the list in hash-of-hash
# this hash 'components' is organized as components{Component1}{Priority}
# and this entry contains a list of hashes of the type 'entry' (above)
push @{ $components{$com}{$pri} }, \%entry; ########## QUESTION BELOW
}
}


sub printOut {
my ($component, $pri);
my @records;
my $rec = {};
foreach $component ( keys %components ) {
foreach $pri (sort keys %{ $components{$component} } ) {
print "Printing $component/$pri\n" if $debug;
@records = @{ $components{$component}{$pri} } ;
my $count = 1;
foreach $rec (@records) {
my %e = %{ $rec };
print "$count: $e{com}, $e{pri}, $e{summary}\n";
}
}
}
}

sub main {
&readFile();
&printOut();
}


&main();


[\code]



Thanks. I cleaned it up and the result is much better than my previous hasty

and kludgy attempt. Now the basic thing works. However one question I have is why do

I need to pass the %entry by ref in last line of readFile() (see NOTE in code)?

If I don't pass by ref then I think the hash somehow gets converted to

string (key1 value1 key2 value2 ... etc.). The way I'm doing it, is that the

correct way of adding the hash to a list (and also of extracting) ?



Thanks for the help :)


KevinR
Veteran


Feb 16, 2008, 11:58 AM

Post #6 of 7 (4716 views)
Re: [mgpg] translate csv format [In reply to] Can't Post

It's a bit hard to answer your question without a better understanding of your data and how it's structured. The answer is if it works like you have it then you are not doing anything technically wrong. But there might be better ways to accomplish whatever it is your script is doing. Have you tried just passing the hash in (not using a reference) and getting back out as a hash?


Code
foo(%hash); 

sub foo {
my %hash = @_;
}


Perl does not send a hash to the function, it just sends a list of strings or scalars, this is called "flattening". But you can reconstruct the hash (or array) by packing it back into a hash on the receiving end like I did above. When you have complex data like you have, it is almost always easier to use a reference to the data, which is just one string (the memory address of the data) which you dereference later to get the data back out of the reference.

It's hard to look at some code like you posted and know if it is doing something the right way or not (at least for me it is) because it may be part of a larger program. At a glance you code is overly complicated and verbose, but you may be just in the process of trying to learn how to do things, like use references and subroutines and such and the code is not meant to be as well written as maybe it could be. Not that there is anything terribly wrong that I can see, overly verbose code is better than too terse in my opinion.
-------------------------------------------------


(This post was edited by KevinR on Feb 16, 2008, 12:00 PM)


mgpg
Novice

Feb 19, 2008, 3:28 PM

Post #7 of 7 (4693 views)
Re: [KevinR] translate csv format [In reply to] Can't Post

Thanks for all the help kevin. This is not part of the larger program. That's all there is. I'm learning to write perl programs. I guess as I write more programs (intelligently) I'll learn. It is now working though :) thanks.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives