CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
how to compare two simple txt files

 



cliffyiu
Novice

Mar 16, 2010, 10:36 AM

Post #1 of 20 (2096 views)
how to compare two simple txt files Can't Post

Hi, All

I have two simple .txt files as below
$ cat test1.txt
10
20
14
15
35
67
35
89
66
65

$ cat test2.txt
99
79
10
35
12
66

I just wrote a perl script to compare them and output the in-common data to one file and the un-common data to another file.


Code
#!/usr/bin/perl 
use strict;
use warnings;

my($infile1_name, $infile2_name, $outfile1_name, $outfile2_name) = @ARGV;

open (IN1, "<", $infile1_name) or die "Can't read file $infile1_name: $!\n";
open (IN2, "<", $infile2_name) or die "Can't read file $infile2_name: $!\n";
open (OUT1, ">", $outfile1_name) or die "Can't write on file $outfile1_name: $!";
open (OUT2, ">", $outfile2_name) or die "Can't write on file $outfile2_name: $!";

while (<IN1>) {

my $input1 = $_;

while (<IN2>) {

my $input2 = $_;

if ($input1 == $input2) {

print OUT1 $input1;
} else {

print OUT2 $input1;
}
}
}

close IN1;
close IN2;
close OUT1;
close OUT2;



After running

$ perl test.pl test1.txt test2.txt out1.txt out2.txt

The error message is:

No such class input2 at test.pl line 18, near "my input2"
syntax error at test.pl line 18, near "my input2 ="
Global symbol "$input2" requires explicit package name at test.pl line 20.
Execution of test.pl aborted due to compilation errors.

Anyone knows what the problem is?

All your help will be deeply appreciated!


FishMonger
Veteran / Moderator

Mar 16, 2010, 10:42 AM

Post #2 of 20 (2093 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post

Compare the code you posted against the script you ran that generated the error and you'll find the answer. Hint, the code you posted is (syntactically) correct.


(This post was edited by FishMonger on Mar 16, 2010, 10:43 AM)


cliffyiu
Novice

Mar 16, 2010, 11:12 AM

Post #3 of 20 (2087 views)
Re: [FishMonger] how to compare two simple txt files [In reply to] Can't Post

yeah... I know logically, it is not correct. But I just can't figure it out..

I changed it to


Code
#!/usr/bin/perl 
use strict;
use warnings;

my($infile1_name, $infile2_name, $outfile1_name, $outfile2_name) = @ARGV;

open (IN1, "<", $infile1_name) or die "Can't read file $infile1_name: $!\n";
open (IN2, "<", $infile2_name) or die "Can't read file $infile2_name: $!\n";
open (OUT1, ">", $outfile1_name) or die "Can't write on file $outfile1_name: $!";
open (OUT2, ">", $outfile2_name) or die "Can't write on file $outfile2_name: $!";

while (<IN1>) {

my $input1 = $_;

while (<IN2>) {

my $input2 = $_;

if ($input1 == $input2) {

print OUT1 $input2;
} else {

print OUT2 $input2;
}
}
}

close IN1;
close IN2;
close OUT1;
close OUT2;


the output is not what I expected..

Basically, I wanna output the in-common data to OUT1, and the uncommon data (from both input files or either of them) to OUT2. I just can't figure out the right logic to do that..


FishMonger
Veteran / Moderator

Mar 16, 2010, 11:17 AM

Post #4 of 20 (2084 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post

See: 'perldoc -q duplicate'
http://perldoc.perl.org/perlfaq4.html#How-can-I-remove-duplicate-elements-from-a-list-or-array?


7stud
Enthusiast

Mar 16, 2010, 11:50 AM

Post #5 of 20 (2079 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post

What output do you expect if you run a program like this:


Code
use strict; 
use warnings;
use 5.010;

while (<DATA>) {
print;
}

while (<DATA>) {
print;
}

__END__
10
20
30


DATA is a file handle that perl provides automatically, and it reads the lines following a line with __END__ on it.

Are files circular? In other words, when you get to the last line of a file, do you start over at the beginning? If after reaching the end, you start over at the beginning, how would a while loop that is reading a file ever end?


(This post was edited by 7stud on Mar 16, 2010, 12:00 PM)


cliffyiu
Novice

Mar 16, 2010, 12:06 PM

Post #6 of 20 (2072 views)
Re: [7stud] how to compare two simple txt files [In reply to] Can't Post

Hi, 7stud

Thanks for your response. I want two outputs

OUTPUT1: with all overlapping data from the two input files

OUTPUT2: with all data that are present in input file1 but not in in input file2.

I am testing a script using hash for this job. But is there any way to avoid hash and just use the script I wrote at the beginning (I mean similar logic but no hash)?

Thanks


cliffyiu
Novice

Mar 16, 2010, 12:26 PM

Post #7 of 20 (2068 views)
Re: [FishMonger] how to compare two simple txt files [In reply to] Can't Post


In Reply To


ok.. FishMonger, Thanks for pointing me to that page. I wrote a hash script and it seems working


Code
#!/usr/bin/perl 
use strict;
use warnings;

my($infile1_name, $infile2_name, $outfile1_name, $outfile2_name) = @ARGV;
my %hash1;
my %hash2;

open (IN1, "<", $infile1_name) or die "Can't read file $infile1_name: $!\n";
while (<IN1>) {

$hash1{$_} = 1;
}
close IN1;

open (IN2, "<", $infile2_name) or die "Can't read file $infile2_name: $!\n";
while (<IN2>) {

$hash2{$_} = 1;
}
close IN2;

open (OUT1, ">", $outfile1_name) or die "Can't write on file $outfile1_name: $!\n";
open (OUT2, ">", $outfile2_name) or die "Can't write on file $outfile2_name: $!\n";
foreach my $key1 (keys %hash1) {

if ($hash2{$key1} == 1) {

print OUT1 $key1;
} else {

print OUT2 $key1;
}
}

close OUT1;
close OUT2;


After running $ perl compr_two_files.pl test1.txt test2.txt out1.txt out2.txt

it gave me two output files

$ cat out1.txt
35
10
66

$ cat out2.txt
20
67
65
15
89
14

I still got an error message:

Use of uninitialized value in numeric eq (==) at compr_two_files.pl line 27.

I can't figure it out why..

Also, do you know how to avoid hash? just like what I tried to do at the beginning?

Thanks a lot


FishMonger
Veteran / Moderator

Mar 16, 2010, 1:08 PM

Post #8 of 20 (2062 views)
Re: [7stud] how to compare two simple txt files [In reply to] Can't Post

You only need 1 hash.

As you loop over the second file, test the existence of the hash key and send output as needed.


7stud
Enthusiast

Mar 16, 2010, 1:45 PM

Post #9 of 20 (2060 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post


Code
Thanks for your response. I want two outputs  

OUTPUT1: with all overlapping data from the two input files

OUTPUT2: with all data that are present in input file1 but not in in input file2.

Yes, you've repeated that enough times. You need to be able to answer the questions in my previous post. Decide what you think the output will be from my program, then run it and see what the actual output is. Try to figure out why the program outputs what it does.


(This post was edited by 7stud on Mar 16, 2010, 1:53 PM)


cliffyiu
Novice

Mar 17, 2010, 11:51 AM

Post #10 of 20 (2033 views)
Re: [7stud] how to compare two simple txt files [In reply to] Can't Post


In Reply To
What output do you expect if you run a program like this:


Code
use strict; 
use warnings;
use 5.010;

while (<DATA>) {
print;
}

while (<DATA>) {
print;
}

__END__
10
20
30


DATA is a file handle that perl provides automatically, and it reads the lines following a line with __END__ on it.

Are files circular? In other words, when you get to the last line of a file, do you start over at the beginning? If after reaching the end, you start over at the beginning, how would a while loop that is reading a file ever end?



Sorry, I misunderstood your post..

I tried this script. The files are not circular. I just got one output

10
20
30


I expected to get

10
20
30
10
20
30


I understand in one while loop, if I get to the last line of a file, I won't start over again. But what I don't understand is, these are two independent while loops and why the second while doesn't read DATA from the beginning although the first while already reaches the last line of DATA.

Thanks very much for lecturing me on this. It will make me understand the while loop more thoroughly.


cliffyiu
Novice

Mar 17, 2010, 12:58 PM

Post #11 of 20 (2029 views)
Re: [FishMonger] how to compare two simple txt files [In reply to] Can't Post


In Reply To
You only need 1 hash.

As you loop over the second file, test the existence of the hash key and send output as needed.


Hi, FishMonger

I managed to use one hash as below


Code
#!/usr/bin/perl 
use strict;
use warnings;

my($infile1_name, $infile2_name, $outfile1_name, $outfile2_name) = @ARGV;
my %hash;
# my %hash2;
my $hash_value = 1;

open (IN1, "<", $infile1_name) or die "Can't read file $infile1_name: $!\n";
while (<IN1>) {

$hash{$_} = 1;
}
close IN1;

open (IN2, "<", $infile2_name) or die "Can't read file $infile2_name: $!\n";
open (OUT1, ">", $outfile1_name) or die "Can't write on file $outfile1_name: $!\n";
open (OUT2, ">", $outfile2_name) or die "Can't write on file $outfile2_name: $!\n";
while (<IN2>) {

if ($hash{$_} == 1) {

print OUT1 $_; # Output the in-common data to $outfile1_name
} else {

print OUT2 $_; # Output the data which are present in IN2 but not in IN1 to $outfile2_name
}
}
close IN2;
close OUT1;
close OUT2;
exit;


the output files are correct. But I still got some error or warning messages like

Use of uninitialized value in numeric eq (==) at compr_two_files.pl line 22, <IN2> line 1.
Use of uninitialized value in numeric eq (==) at compr_two_files.pl line 22, <IN2> line 2.
Use of uninitialized value in numeric eq (==) at compr_two_files.pl line 22, <IN2> line 5.

Do you know why? I will greatly appreciate it if you can point out the remaining problem in my script.

Thanks!


FishMonger
Veteran / Moderator

Mar 17, 2010, 2:03 PM

Post #12 of 20 (2024 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post

Change:

Code
if ($hash{$_} == 1) {


To:

Code
if (exists $hash{$_}) {



cliffyiu
Novice

Mar 17, 2010, 4:52 PM

Post #13 of 20 (2019 views)
Re: [FishMonger] how to compare two simple txt files [In reply to] Can't Post


In Reply To
Change:

Code
if ($hash{$_} == 1) {


To:

Code
if (exists $hash{$_}) {



FishMonger, thanks for correcting my script! It is working now.

But can you give me some hint about why I can't use "$hash{$_} == 1"? I just don't understand why...

Thanks a lot


FishMonger
Veteran / Moderator

Mar 17, 2010, 5:46 PM

Post #14 of 20 (2017 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post

If a line in the second file doesn't exist in the first file, then it won't be in the hash, which means $hash{$_} won't have a value and is the cause of your warning.

The exists function basically does a boolean test on the existence of the hash key. You should read the perldoc on the function to get the details of what is does.

http://perldoc.perl.org/functions/exists.html


cliffyiu
Novice

Mar 18, 2010, 10:38 AM

Post #15 of 20 (2008 views)
Re: [FishMonger] how to compare two simple txt files [In reply to] Can't Post


In Reply To
If a line in the second file doesn't exist in the first file, then it won't be in the hash, which means $hash{$_} won't have a value and is the cause of your warning.

The exists function basically does a boolean test on the existence of the hash key. You should read the perldoc on the function to get the details of what is does.

http://perldoc.perl.org/functions/exists.html


Got it! Thanks a lot~


7stud
Enthusiast

Mar 18, 2010, 2:55 PM

Post #16 of 20 (2004 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post


Quote
I understand in one while loop, if I get to the last line of a file, I won't start over again. But what I don't understand is, these are two independent while loops and why the second while doesn't read DATA from the beginning although the first while already reaches the last line of DATA.

Ok, so after the first while loop reaches the end of the file, the while loop doesn't start over from the beginning of the file again, and the while loop ends. What causes the reads from a file to start at the beginning again? Does writing:


Code
while (<SOME_FILE_HANDLE>)


cause the reads to start at the beginning of the specified file? Let's test that out. Run this program:


Code
use strict; 
use warnings;
use 5.010;

while (<DATA>) {
print "first while(): $_";
last;
}

while (<DATA>) {
print "second while(): $_";
last;
}

while (<DATA>) {
print "third while(): $_";
last;
}

while (<DATA>) {
print "fourth while(): $_";
last;
}

__END__
10
20
30

Examine the output. How many lines of output are there? How many while() loops are there? Come up with a theory to explain what is happening.


(This post was edited by 7stud on Mar 18, 2010, 3:11 PM)


cliffyiu
Novice

Mar 22, 2010, 10:53 AM

Post #17 of 20 (1963 views)
Re: [7stud] how to compare two simple txt files [In reply to] Can't Post


Code
use strict; 
use warnings;
use 5.010;

while (<DATA>) {
print "first while(): $_";
last;
}

while (<DATA>) {
print "second while(): $_";
last;
}

while (<DATA>) {
print "third while(): $_";
last;
}

while (<DATA>) {
print "fourth while(): $_";
last;
}

__END__
10
20
30

Examine the output. How many lines of output are there? How many while() loops are there? Come up with a theory to explain what is happening.


Hi, 7stud

Thanks very much for this example! I just tested it and the output is like

first while(): 10
second while(): 20
third while(): 30

It looks like, while loop reads lines one by one from the input file. Once the first while loop stops reading somewhere, the second while loop will start reading from the place where it stops from the first while loop. And once the third while loop reads in the last line of the input file, any following while loop won't be able to read in that input file any more.

It looks to me that there is some pointer there in the input file that can tell the while loop where to start and where to stop...

I kinda get your points now. But I have been trying to search online for any related tutorial and can't find any.. Can I know how you figured it out? just by experience?

Thanks again!


7stud
Enthusiast

Mar 22, 2010, 4:56 PM

Post #18 of 20 (1953 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post


Quote
But I have been trying to search online for any related tutorial and can't find any.


What do you see when you search google for 'file pointers'?


cliffyiu
Novice

Mar 23, 2010, 10:27 AM

Post #19 of 20 (1944 views)
Re: [7stud] how to compare two simple txt files [In reply to] Can't Post


In Reply To

Quote

What do you see when you search google for 'file pointers'?


Thanks! I just checked this page

http://www.wellho.net/mouth/1442_Reading-a-file-multiple-times-file-pointers.html

"When you open a file for read, you create a "file pointer" which knows where in the file you are, and each time you read from the file this advances so that you get the next piece of data each time you do a read. Usually you don't see the file pointer at all - it's internal - and you think nothing of it as it behaves in a natural way."

Now, I understand that, when you stop reading a DATA, the internal file pointer will point to your stop position. Then, when you re-read the DATA, it will start from where you stopped last time.

Awesome.. Thanks a lot, 7stud!


rfransix
Novice

Jun 17, 2010, 12:27 PM

Post #20 of 20 (1828 views)
Re: [cliffyiu] how to compare two simple txt files [In reply to] Can't Post

Grateful for your expertise; I scraped this code off the page and tried re-using it, yet it does not work as hoped. Thoughts on repairing it are welcome.

Both PERSON and MGR have an equal number of lines. The desired result in OUT is:

line1 from PERSON
changetype: modify
replace: manager
line1 from MGR

line21 from PERSON
changetype: modify
replace: manager
line2 from MGR

...etc...

#! perl -slw
use strict;
use warnings;
use diagnostics;
use constant batchdir => "c:\temp10";

open PERSON, "<", "ou3" or die "Cannot read 'ou3': $!";
open MGR, "<", "ou8" or die "Cannot read 'ou8': $!";
open OUT, ">", "buildAD.ldif" or die "Cannot write to 'buildAD.ldif': $!";
truncate (OUT, 0);

while (<PERSON>) {
chomp;
my $input1 = $_;
while (<MGR>) {
chomp;
my $input2 = $_;
if ($input1 != $input2) {
print OUT $input1;
print OUT "changetype: modify";
print OUT "replace: manager";
print OUT $input2, "\n";
}
}
}

exit;

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives