CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Ignoring headers from a CSV file

 



gelid101
Novice

Aug 7, 2008, 5:25 AM

Post #1 of 14 (1877 views)
Ignoring headers from a CSV file Can't Post

Hi, I am completely new to Perl, and new to programming. My Perl experience is 2 days only. This is what I am trying to do: Open a CSV file and parse it. However, I need to ignore the first line which contains the headers. Could someone please help me out? This is my code:


Code
# Make Unix style path 
$dir=~s|\\|/|gi;

# Remove trailing slashes
$sep=$/; $/="/"; chomp($dir); $/=$sep;

# Now try to get the list of files
open(FILELIST,"<wp.csv")
or die "Could not find file list at $dir/wp.csv.\n$docstring";

while(<FILELIST>) {
$line=$_;
chomp($line);
$line=~tr/"//d; #removes the double quotes
($iname,$idescr,$idate,$ipermission,$iothers,$lat,$long,$region,$rtype,$geo_heading,$cat1) = split ',' , $line; #separates the csv values into fields

#stores wikisyntax in a string
$metatext= "==Summary==\n{{Information\n|Description=$idescr\n|Source=Own work by uploader\n|Author=[[User:$username|$username]]\n|date=$idate\n|Permission={{self|$ipermission}}\n|other versions=$iothers\n}}\n\n{{location dec|$lat|N|$long|E|region:$region\_type:$rtype\_heading:$geo_heading}}\n\n==[[Commons:Copyright tags|Licensing]]==\n{{self|$ipermission}}\n\n\n[[Category:$cat1]]\n\n\n";

print $metatext;
}



KevinR
Veteran


Aug 7, 2008, 10:21 AM

Post #2 of 14 (1869 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post


Code
open(FILELIST,"<wp.csv")   
or die "Could not find file list at $dir/wp.csv.\n$docstring";

<FILELIST>; #<-- skips first line

-------------------------------------------------


gelid101
Novice

Aug 9, 2008, 8:40 AM

Post #3 of 14 (1855 views)
Re: [KevinR] Ignoring headers from a CSV file [In reply to] Can't Post

Thanks Kevin! :) It worked!


shawnhcorey
Enthusiast


Aug 10, 2008, 6:25 AM

Post #4 of 14 (1846 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post


In Reply To
Hi, I am completely new to Perl, and new to programming. My Perl experience is 2 days only. This is what I am trying to do: Open a CSV file and parse it. However, I need to ignore the first line which contains the headers. Could someone please help me out? This is my code:


First some comments about your code.


Code
# Make Unix style path  
$dir=~s|\\|/|gi;

# Remove trailing slashes
$sep=$/; $/="/"; chomp($dir); $/=$sep;


This is not necessary. perl compiled under Windows interpret slashes and backslashes the same way for directory paths. Also, multiple slashes or backslashes are treated as one.


Code
# Now try to get the list of files  
open(FILELIST,"<wp.csv")
or die "Could not find file list at $dir/wp.csv.\n$docstring";


You should use the three-argument version of open:


Code
open my $fh, '<', "$dir/wp.csv" 
or die "cannot open $dir/wp.csv: $!";


As for the rest, I afraid it will not work. This is because CSV is not context-free. A comma or a double quote in the file may be a directive or data, depending on its context. To correctly get the contents, you need a parser.

Here is a program with a CSV parser called read_csv(). It also includes a print_csv(). It has had only limited testing, so there may be bugs.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use utf8;

use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
$Data::Dumper::Maxdepth = 0;

# --------------------------------------
# Usage: print_csv( $fh, @fields );
# Purpose: Print the field to the file as comma-separated values.
# Returns: none
# Parameters: $fh -- file handle
# @fields -- list of fields
#
sub print_csv {
my $fh = shift @_;
my @fields = @_;
my @sanitized = ();

for ( @fields ){
if( /^\s|[",\n]|\s$/ ){
push @sanitized, "\"$_\"";
}else{
push @sanitized, $_;
}
}

print $fh join( ",", @sanitized ), "\n";
return;
}

# --------------------------------------
# Usage: @fields = read_csv( $fh );
# Purpose: Read one tuple from a file.
# Returns: @fields -- list of fields read
# Parameters: $fh -- file handle
#
sub read_csv {
my $fh = shift @_;
my @fields = ();
my $field = '';
my $state = 0; # 0 == free context
# 1 == inside double quotes

my $line = <$fh>;
while( length( $line ) ){

if( $state == 0 ){
if( $line eq "\n" ){
last;
}
if( $line =~ s/^"// ){
$state = 1;
$field = '';
}else{
$line =~ s/^([^,\n]*)//;
push @fields, $1;
$line =~ s/^,//;
}
}

if( $state == 1 ){
for(;;){
while( $line =~ s/^([^"]+)// ){
$field .= $1;
$line = <$fh> unless length( $line );
}
if( $line =~ s/^""// ){
$field .= '"';
next;
}
$line =~ s/^"//;
push @fields, $field;
$state = 0;
$line =~ s/^,//;
last;
}
}

}

return @fields;
}

for my $file ( @ARGV ){
open my $fh, '<', $file or die "cannot open $file: $!";
my @titles = read_csv( $fh );
printf "Titles [%d]: ", scalar( @titles );
print Dumper \@titles;
while( ! eof( $fh ) ){
my @fields = read_csv( $fh );
printf "Fields [%d]: ", scalar( @fields );
print Dumper \@fields;
}
close $fh;
}

__END__


__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


gelid101
Novice

Aug 20, 2008, 1:01 PM

Post #5 of 14 (1751 views)
Re: [shawnhcorey] Ignoring headers from a CSV file [In reply to] Can't Post

 I understand that a comma or quote can ruin the fields, but I just cannot figure out how this code works. :( Its a bit above my head. How would I extract data from each column in this case?

$col1, $col2 etc?

Thanks!


shawnhcorey
Enthusiast


Aug 20, 2008, 4:21 PM

Post #6 of 14 (1745 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post


In Reply To
I understand that a comma or quote can ruin the fields, but I just cannot figure out how this code works. :( Its a bit above my head. How would I extract data from each column in this case?

$col1, $col2 etc?

Thanks!



Code
open my $fh, '<', $csv_file or die "cannot open $csv_file: $!"; 
for(;;){
( $col1, $col2, $col3, @rest ) = read_csv( $fh );
last if eof( $fh );
# process the fields
}


__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


gelid101
Novice

Aug 24, 2008, 8:56 AM

Post #7 of 14 (1701 views)
Re: [shawnhcorey] Ignoring headers from a CSV file [In reply to] Can't Post

Thanks, it seems to work fine so far, but will subject it to additional testing. One small issue, how do I skip the header line of the CSV file?

I'm releasing my code under GPL v3. Can I use the above code under this licence?

Thanks!


shawnhcorey
Enthusiast


Aug 24, 2008, 11:09 AM

Post #8 of 14 (1694 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post


In Reply To
Thanks, it seems to work fine so far, but will subject it to additional testing. One small issue, how do I skip the header line of the CSV file?



Code
open my $fh, '<', $csv_file or die "cannot open $csv_file: $!";  
my @titles = read_csv( $fh );
for(;;){
( $col1, $col2, $col3, @rest ) = read_csv( $fh );
last if eof( $fh );
# process the fields
}



In Reply To
I'm releasing my code under GPL v3. Can I use the above code under this licence?


Yes

__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


gelid101
Novice

Aug 24, 2008, 11:39 AM

Post #9 of 14 (1693 views)
Re: [shawnhcorey] Ignoring headers from a CSV file [In reply to] Can't Post

Thank you so much :)


KevinR
Veteran


Aug 24, 2008, 12:48 PM

Post #10 of 14 (1690 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post

I'd take a look at Text::CSV_XS also for reading and writing to a CSV file. Its very fast and has some other advantages.
-------------------------------------------------


gelid101
Novice

Aug 27, 2008, 4:08 AM

Post #11 of 14 (1665 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post

Hi!

I found a bug with the parser. I notice that the last line in the CSV file does not get parsed. :(


gelid101
Novice

Aug 27, 2008, 9:11 AM

Post #12 of 14 (1656 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post

The code has run into about 650 lines,

Details on how to run the code
http://commons.wikimedia.org/wiki/User:Nichalp/Upload_script

Its hard for me to pinpoint the error :(


KevinR
Veteran


Aug 27, 2008, 11:39 AM

Post #13 of 14 (1653 views)
Re: [gelid101] Ignoring headers from a CSV file [In reply to] Can't Post

try removing this line:

last if eof( $fh );
-------------------------------------------------


gelid101
Novice

Sep 1, 2008, 11:53 AM

Post #14 of 14 (1593 views)
Re: [KevinR] Ignoring headers from a CSV file [In reply to] Can't Post

Unfortunately, it seems to have no effect, and throws up tons of errors.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives