
2teez
Novice
Sep 26, 2013, 5:55 AM
Post #2 of 4
(3102 views)
|
Re: [Tejas] Comparing 2 Huge Csv Files and Find the Mismatched Rows
[In reply to]
|
Can't Post
|
|
Hi Tejas, Since you are working with a CSV files the rule of thump is to ask you to check how to use a CSV module like Text::CSV_XS http://search.cpan.org/~hmbrand/Text-CSV_XS-1.01/CSV_XS.pm. But one can simply use split function in perl to work this dataset presented like so:
use warnings; use strict; use Inline::Files; my %id1; while (<DATA2>) { chomp; my @lines = split /,/, $_; push @{ $id1{ $lines[0] } }, @lines[ 1 .. $#lines ]; } <DATA1>; # take out the header if not needed while (<DATA1>) { chomp; next if /^$/; # next on blanck line my @lines2 = split /,/, $_; if ( exists $id1{ $lines2[0] }) { if ( $id1{ $lines2[0] }->[0] eq $lines2[1] ) { print $_,"Matched", $/; } else{ print $_,"Mis-matched comparing: ", $id1{ $lines2[0] }->[0],' with ',$lines2[1], $/; } }else{ print $_," doesn't exist in second file",$/; } } __DATA1__ ID1 ,AMT, ID2 CDZNYQ9R8108QR3E3EJ0,3900.00,351 V0Y9WC7YYJ6V8T3DDJM0,3900.00,351 BZ6FD9Q3964VX16EMKY0,3900.00,351 3S5VCXSSS0PPV9V875Q1,3900.00,351 687802764243,399.00,362 __DATA2__ BZ6FD9Q3964VX16EMKY0,900.00,351 3S5VCXSSS0PPV9V875Q1,900.00,351 V0Y9WC7YYJ6V8T3DDJM0,900.00,351 CDZNYQ9R8108QR3E3EJ0,900.00,351 Please, note that for the above codes to work you must have the module Inline::Files installed. If not you might have to open the two files using two separate open functions. A closer look at the codes above show the use of two similar while loops which some would consider a repetition, so in that case one might do the following:
use warnings; use strict; die "Usage: perlscript.pl file1 file2 " unless @ARGV == 2; my ( $file1, $file2 ) = @ARGV; my %id1; my %file_operation = ( $file1 => sub { return if $_[0] =~ /^$|^\bID1\b/; # next on blanck line or header my @lines2 = split /,/, $_[0]; if ( exists $id1{ $lines2[0] } ) { if ( $id1{ $lines2[0] }->[0] eq $lines2[1] ) { print $_[0], "Matched", $/; } else { print $_[0], "Mis-matched comparing: ", $id1{ $lines2[0] }->[0], ' with ', $lines2[1], $/; } } else { print $_[0], " doesn't exist in second file", $/; } }, $file2 => sub { my @lines = split /,/, $_[0]; push @{ $id1{ $lines[0] } }, @lines[ 1 .. $#lines ]; }, ); open_file( $_, $file_operation{$_} ) for ( $file2, $file1 ); sub open_file { my ( $filename, $code_ref ) = @_; open my $fh, '<', $filename or die $!; while (<$fh>) { chomp; $code_ref->($_); } } The basic logic here is load the file to compare into an hash, then stepwise check the hash key with each line of the main file then if you like you can have different arrays or files opened to "put in" those line that matched or otherwise or just print it out like I did in the codes above. Hope this helps.
(This post was edited by 2teez on Sep 26, 2013, 6:38 AM)
|