CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner: Re: [Tejas] Comparing 2 Huge Csv Files and Find the Mismatched Rows: Edit Log



2teez
Novice

Sep 26, 2013, 5:55 AM


Views: 3169
Re: [Tejas] Comparing 2 Huge Csv Files and Find the Mismatched Rows

Hi Tejas,
Since you are working with a CSV files the rule of thump is to ask you to check how to use a CSV module like Text::CSV_XS http://search.cpan.org/~hmbrand/Text-CSV_XS-1.01/CSV_XS.pm. But one can simply use split function in perl to work this dataset presented like so:


Code
use warnings; 
use strict;
use Inline::Files;

my %id1;

while (<DATA2>) {
chomp;
my @lines = split /,/, $_;
push @{ $id1{ $lines[0] } }, @lines[ 1 .. $#lines ];
}

<DATA1>; # take out the header if not needed

while (<DATA1>) {
chomp;
next if /^$/; # next on blanck line
my @lines2 = split /,/, $_;
if ( exists $id1{ $lines2[0] }) {
if ( $id1{ $lines2[0] }->[0] eq $lines2[1] ) {
print $_,"Matched", $/;
}
else{
print $_,"Mis-matched comparing: ",
$id1{ $lines2[0] }->[0],' with ',$lines2[1], $/;
}
}else{
print $_," doesn't exist in second file",$/;
}
}

__DATA1__
ID1 ,AMT, ID2

CDZNYQ9R8108QR3E3EJ0,3900.00,351
V0Y9WC7YYJ6V8T3DDJM0,3900.00,351
BZ6FD9Q3964VX16EMKY0,3900.00,351
3S5VCXSSS0PPV9V875Q1,3900.00,351
687802764243,399.00,362


__DATA2__
BZ6FD9Q3964VX16EMKY0,900.00,351
3S5VCXSSS0PPV9V875Q1,900.00,351
V0Y9WC7YYJ6V8T3DDJM0,900.00,351
CDZNYQ9R8108QR3E3EJ0,900.00,351


Please, note that for the above codes to work you must have the module Inline::Files installed. If not you might have to open the two files using two separate open functions.

A closer look at the codes above show the use of two similar while loops which some would consider a repetition, so in that case one might do the following:

Code
use warnings; 
use strict;

die "Usage: perlscript.pl file1 file2 " unless @ARGV == 2;
my ( $file1, $file2 ) = @ARGV;

my %id1;

my %file_operation = (
$file1 => sub {
return if $_[0] =~ /^$|^\bID1\b/; # next on blanck line or header
my @lines2 = split /,/, $_[0];
if ( exists $id1{ $lines2[0] } ) {
if ( $id1{ $lines2[0] }->[0] eq $lines2[1] ) {
print $_[0], "Matched", $/;
}
else {
print $_[0], "Mis-matched comparing: ",
$id1{ $lines2[0] }->[0], ' with ', $lines2[1], $/;
}
}
else {
print $_[0], " doesn't exist in second file", $/;
}
},
$file2 => sub {
my @lines = split /,/, $_[0];
push @{ $id1{ $lines[0] } }, @lines[ 1 .. $#lines ];
},
);

open_file( $_, $file_operation{$_} ) for ( $file2, $file1 );

sub open_file {
my ( $filename, $code_ref ) = @_;
open my $fh, '<', $filename or die $!;
while (<$fh>) {
chomp;
$code_ref->($_);
}
}

The basic logic here is load the file to compare into an hash, then stepwise check the hash key with each line of the main file then if you like you can have different arrays or files opened to "put in" those line that matched or otherwise or just print it out like I did in the codes above.
Hope this helps.


(This post was edited by 2teez on Sep 26, 2013, 6:38 AM)


Edit Log:
Post edited by 2teez (Novice) on Sep 26, 2013, 6:20 AM
Post edited by 2teez (Novice) on Sep 26, 2013, 6:38 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives