
Tejas
User
Sep 2, 2014, 5:42 AM
Post #1 of 43
(26078 views)
|
Merging the data in two files using a hash
|
Can't Post
|
|
Can SOMEONE please comment on this and can you tell me whether this is good or ugly. And can the code be shrinked
File1 File1 28045071,1,56,DAD,418756991,0,-9.02,01-AUG-14,01-AUG-14,1 28045281,1,19,DAD,12701012015,0,-261.02,01-AUG-14,01-AUG-14,1 28045991,1,19,DAD,379031901,0,-22.42,01-AUG-14,01-AUG-14,1 2213506106,1,24,DAD,1374249100,0,-20,01-AUG-14,01-AUG-14,1 2213506116,1,24,DAD,1374249100,0,-20,01-AUG-14,01-AUG-14,1 2264530076,1,24,DAD,1377063511,0,-350,01-AUG-14,01-AUG-14,1 2613542516,1,24,DAD,501029031,0,-30,01-AUG-14,01-AUG-14,1 2634699316,1,24,DAD,512242996,0,-100,01-AUG-14,01-AUG-14,1 2639141256,1,24,DAD,13496038905,0,-25,01-AUG-14,01-AUG-14,1 2641900466,1,24,DAD,56276190,0,-50,01-AUG-14,01-AUG-14,1 28053391,1,19,DAD,766709012,0,-70,01-AUG-14,01-AUG-14,1
28051341,1,56,DAD,199610116,0,-12.74,02-AUG-14,02-AUG-14,1 28051961,1,19,DAD,6735124615,0,-36.45,02-AUG-14,02-AUG-14,1 28052061,1,19,DAD,394104487,0,-48.61,02-AUG-14,02-AUG-14,1 28053391,1,19,DAD,766709012,0,-60,02-AUG-14,02-AUG-14,1 2399932016,1,24,DAD,567508320,0,-50,02-AUG-14,02-AUG-14,1 2451060666,1,24,DAD,499140250,0,-50,02-AUG-14,02-AUG-14,1 2495205736,1,24,DAD,774256411,0,-20,02-AUG-14,02-AUG-14,1 2604153876,1,24,DAD,7378719,0,-50,02-AUG-14,02-AUG-14,1 2638779256,1,24,DAD,240129917,0,-50,02-AUG-14,02-AUG-14,1 2646215356,1,24,DAD,1036846291,0,-40,02-AUG-14,02-AUG-14,1 OUTPUT OUTPUT 28045071,1,56,DAD,418756991,0,-9.02,01-AUG-14,01-AUG-14,1 28045281,1,19,DAD,12701012015,0,-261.02,01-AUG-14,01-AUG-14,1 28045991,1,19,DAD,379031901,0,-22.42,01-AUG-14,01-AUG-14,1 2213506106,1,24,DAD,1374249100,0,-20,01-AUG-14,01-AUG-14,1 2213506116,1,24,DAD,1374249100,0,-20,01-AUG-14,01-AUG-14,1 2264530076,1,24,DAD,1377063511,0,-350,01-AUG-14,01-AUG-14,1 2613542516,1,24,DAD,501029031,0,-30,01-AUG-14,01-AUG-14,1 2634699316,1,24,DAD,512242996,0,-100,01-AUG-14,01-AUG-14,1 2639141256,1,24,DAD,13496038905,0,-25,01-AUG-14,01-AUG-14,1 2641900466,1,24,DAD,56276190,0,-50,01-AUG-14,01-AUG-14,1 28051341,1,56,DAD,199610116,0,-12.74,02-AUG-14,02-AUG-14,1 28051961,1,19,DAD,6735124615,0,-36.45,02-AUG-14,02-AUG-14,1 28052061,1,19,DAD,394104487,0,-48.61,02-AUG-14,02-AUG-14,1 28053391,1,19,DAD,766709012,0,-60,02-AUG-14,02-AUG-14,1 This Txn is repeated and the latest has to be considered, Second File being the latest. 2399932016,1,24,DAD,567508320,0,-50,02-AUG-14,02-AUG-14,1 2451060666,1,24,DAD,499140250,0,-50,02-AUG-14,02-AUG-14,1 2495205736,1,24,DAD,774256411,0,-20,02-AUG-14,02-AUG-14,1 2604153876,1,24,DAD,7378719,0,-50,02-AUG-14,02-AUG-14,1 2638779256,1,24,DAD,240129917,0,-50,02-AUG-14,02-AUG-14,1 2646215356,1,24,DAD,1036846291,0,-40,02-AUG-14,02-AUG-14,1 We can see that 28053391,1,19,DAD,766709012,0,-70,01-AUG-14,01-AUG-14,1 is repeated in both the files, but the latest should be considered and should be printed. So, In the output the second file's data is printed Output has All First Files Txns and All Second Files Txns and if the Txn Repeats (Key is first column) in second file, Second file's data has to be considered.
#! /usr/bin/perl my $pwd = `pwd`; chomp($pwd); my $clr_txns= "$pwd/File1.txt"; my $temp_file = "$pwd/File2.txt"; my $final_output= "$pwd/Final_List.txt"; open (FIRST,"< $clr_txns")or die "could not open $clr_txns $!"; open (SECOND,"< $temp_file")or die "could not open $cto_txns $!"; open (MATCH,"> $final_output")or die "could not open $final_output$!"; my %hash = (); my %hash1 = (); while (my $line = <FIRST>) { my @elements = split ',', $line; my $key = $elements[0]; print "$key\n"; $hash{$key} = 1; $hash2{$key} = $line; } #open SECOND, "< $secondFile" or die "could not open second file...\n"; while (my $line = <SECOND>) { my @elements = split ',', $line; my $key = $elements[0]; # Perl arrays are zero-indexed if ($hash{$key}) { #print "($hash{$key} \n"; print MATCH "$line"; $hash{$key} = 0; } else { print MATCH "$line" ; #Also Print unmatched, as we need all the txns from both the files } } while( my( $key, $value ) = each %hash2 ){ if($hash{$key} != 0) { print MATCH "$value"; # Print the values of other files, and eliminate the matched ones } } close (FIRST); close (SECOND); Thanks Tejas
|