CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner: Re: [Tejas] Compare each and every value from two files: Edit Log



Laurent_R
Veteran / Moderator

Jun 2, 2014, 11:09 AM


Views: 13553
Re: [Tejas] Compare each and every value from two files

Please read very carefully what I am saying at the end of this post (after the code).

This is a very quick and incomplete code for the important parts (untested):


Code
my %hash;  
while (my $line = <$FILE1>) {
chomp $line;
my $key = (split /\|/, $line, 2)[0];
push @{$hash{$key}}, $line;
}


And, later in the program:


Code
WHILE_LOOP: while (my $line2 = <$FILE2>) {  
chomp $line2;
my $key = (split /\|/, $line2, 2)[0];
print "missing line: $line \n" and next unless defined $hash{$key};
for my $i (0..scalar @{$hash{$key}) {
if ( ${$hash{$key}->[$i]}) eq $line2) {
delete ${$hash{$key}->[$i]};
next WHILE_LOOP;
}
}
# if we get here, no line was found to be identical but be do not know which one to compare field by field.
# Can we just pick any?
my $line1 = shift @{$hash{$key};
my @array1 = split /\|/, $line1;
my @array2 = split /\|/, $line2;
for my $i (0..$#array1) {
print "something" if $array1[$i] ne $array2[$i]
}
}


This is untested and, since it is a bit more complicated than my previous code, there may very well be some mistakes.

As said in the comments in the middle of the code, we have a serious problem: if we have several lines with the same key in file1 and we find one identical one, it is fine. But if we don't find an identical, then we do not know which to pick up to do the field by field comparison, and this is basically unsolvable unless you can give more rules. Here I have just decided that, in that case I just pick randomly the first one.

I am working on similar file comparisons very often, and we have usually a two-step process: first remove all duplicates from both files and then only compare the individual lines which we know to be unique un each file.


(This post was edited by Laurent_R on Jun 2, 2014, 11:40 AM)


Edit Log:
Post edited by Laurent_R (Veteran) on Jun 2, 2014, 11:40 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives