
shawnhcorey
Enthusiast

Aug 10, 2008, 6:25 AM
Post #4 of 14
(1037 views)
|
|
Re: [gelid101] Ignoring headers from a CSV file
[In reply to]
|
Can't Post
|
|
Hi, I am completely new to Perl, and new to programming. My Perl experience is 2 days only. This is what I am trying to do: Open a CSV file and parse it. However, I need to ignore the first line which contains the headers. Could someone please help me out? This is my code: First some comments about your code.
# Make Unix style path $dir=~s|\\|/|gi; # Remove trailing slashes $sep=$/; $/="/"; chomp($dir); $/=$sep; This is not necessary. perl compiled under Windows interpret slashes and backslashes the same way for directory paths. Also, multiple slashes or backslashes are treated as one. # Now try to get the list of files open(FILELIST,"<wp.csv") or die "Could not find file list at $dir/wp.csv.\n$docstring"; You should use the three-argument version of open:
open my $fh, '<', "$dir/wp.csv" or die "cannot open $dir/wp.csv: $!"; As for the rest, I afraid it will not work. This is because CSV is not context-free. A comma or a double quote in the file may be a directive or data, depending on its context. To correctly get the contents, you need a parser. Here is a program with a CSV parser called read_csv(). It also includes a print_csv(). It has had only limited testing, so there may be bugs.
#!/usr/bin/perl use strict; use warnings; use utf8; use Data::Dumper; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 1; $Data::Dumper::Maxdepth = 0; # -------------------------------------- # Usage: print_csv( $fh, @fields ); # Purpose: Print the field to the file as comma-separated values. # Returns: none # Parameters: $fh -- file handle # @fields -- list of fields # sub print_csv { my $fh = shift @_; my @fields = @_; my @sanitized = (); for ( @fields ){ if( /^\s|[",\n]|\s$/ ){ push @sanitized, "\"$_\""; }else{ push @sanitized, $_; } } print $fh join( ",", @sanitized ), "\n"; return; } # -------------------------------------- # Usage: @fields = read_csv( $fh ); # Purpose: Read one tuple from a file. # Returns: @fields -- list of fields read # Parameters: $fh -- file handle # sub read_csv { my $fh = shift @_; my @fields = (); my $field = ''; my $state = 0; # 0 == free context # 1 == inside double quotes my $line = <$fh>; while( length( $line ) ){ if( $state == 0 ){ if( $line eq "\n" ){ last; } if( $line =~ s/^"// ){ $state = 1; $field = ''; }else{ $line =~ s/^([^,\n]*)//; push @fields, $1; $line =~ s/^,//; } } if( $state == 1 ){ for(;;){ while( $line =~ s/^([^"]+)// ){ $field .= $1; $line = <$fh> unless length( $line ); } if( $line =~ s/^""// ){ $field .= '"'; next; } $line =~ s/^"//; push @fields, $field; $state = 0; $line =~ s/^,//; last; } } } return @fields; } for my $file ( @ARGV ){ open my $fh, '<', $file or die "cannot open $file: $!"; my @titles = read_csv( $fh ); printf "Titles [%d]: ", scalar( @titles ); print Dumper \@titles; while( ! eof( $fh ) ){ my @fields = read_csv( $fh ); printf "Fields [%d]: ", scalar( @fields ); print Dumper \@fields; } close $fh; } __END__ __END__ I love Perl; it's the only language where you can bless your thingy. Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib. Get Markup Help. Please note the markup tag of "code".
|