
Laurent_R
Veteran
/ Moderator
Jun 2, 2014, 11:09 AM
Post #17 of 28
(13818 views)
|
Re: [Tejas] Compare each and every value from two files
[In reply to]
|
Can't Post
|
|
Please read very carefully what I am saying at the end of this post (after the code). This is a very quick and incomplete code for the important parts (untested):
my %hash; while (my $line = <$FILE1>) { chomp $line; my $key = (split /\|/, $line, 2)[0]; push @{$hash{$key}}, $line; } And, later in the program:
WHILE_LOOP: while (my $line2 = <$FILE2>) { chomp $line2; my $key = (split /\|/, $line2, 2)[0]; print "missing line: $line \n" and next unless defined $hash{$key}; for my $i (0..scalar @{$hash{$key}) { if ( ${$hash{$key}->[$i]}) eq $line2) { delete ${$hash{$key}->[$i]}; next WHILE_LOOP; } } # if we get here, no line was found to be identical but be do not know which one to compare field by field. # Can we just pick any? my $line1 = shift @{$hash{$key}; my @array1 = split /\|/, $line1; my @array2 = split /\|/, $line2; for my $i (0..$#array1) { print "something" if $array1[$i] ne $array2[$i] } } This is untested and, since it is a bit more complicated than my previous code, there may very well be some mistakes. As said in the comments in the middle of the code, we have a serious problem: if we have several lines with the same key in file1 and we find one identical one, it is fine. But if we don't find an identical, then we do not know which to pick up to do the field by field comparison, and this is basically unsolvable unless you can give more rules. Here I have just decided that, in that case I just pick randomly the first one. I am working on similar file comparisons very often, and we have usually a two-step process: first remove all duplicates from both files and then only compare the individual lines which we know to be unique un each file.
(This post was edited by Laurent_R on Jun 2, 2014, 11:40 AM)
|