
Util
New User
Oct 25, 2005, 10:39 AM
Post #2 of 2
(580 views)
|
|
Re: [kgbolger] line by line comparison
[In reply to]
|
Can't Post
|
|
Some help with your existing code: Use '#' for comments, instead of '//'. Use 'eq' and 'ne' for string equality, instead of '==' and '!=', which are only for numeric comparison. Always 'use warnings;', so that Perl will tell you when you are using numeric comparison, among other things. This line is nonsensical: '$line + 1 = $loglines; //go to next line and write entry' Your core algorithm is flawed; inside your inner loop 'foreach $line (@parsed_data) {...}', when you see that a single line from @parsed_data is not equal to the current line from @raw_data, that tells you nothing by itself, because the very next line that is about to come from @parsed_data *might* match. Instead, you need to initialize a 'seen' flag to 0, then loop through @parsed_data, setting the seen flag to 1 if any line matches, and then use the seen flag *outside* of the inner loop to trigger any code to operate on 'unseen' lines. Like this:
foreach my $logline (@raw_data) { my $is_already_in_parsed_data = 0; PARSED_LOOP: foreach my $line (@parsed_data) { if ( $logline eq $line ) { $is_already_in_parsed_data = 1; last PARSED_LOOP; } } if ( not $is_already_in_parsed_data ) { print "Line not seen before: '$logline'\n"; push @parsed_data, $logline; } } Your core algorithm is inefficient; looping repeatedly over an array to look for an exact match is a red flag to use a hash instead.
my %parsed_data; foreach my $logline (@raw_data) { my $is_already_in_parsed_data = exists $parsed_data{$logline}; if ( not $is_already_in_parsed_data ) { print "Line not seen before: '$logline'\n"; $parsed_data{$logline}++; } } Here is a (loosely tested) complete program that does what you asked for:
#!/usr/bin/perl use strict; use warnings; =begin comment 2005-10-25 Bruce Gray <bruce.gray@acm.org> Wrote program. This program reads a file, printing the first occurrence of each line as it is seen. It dumps a count of the number of occurrences of each line into a save file. In subsequent runs of the program, the save file is used to initialize the count (and therefore the state of whether an occurrence is "first"). Program written in answer to Perl Guru request: http://perlguru.com/gforum.cgi?post=24449;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed;guest= =cut # Configuration: my $data_file = 'c:/demolog.txt'; my $save_file = 'c:/savelog.txt'; # Read any counts of data lines from the save file. # Place them in %lines_seen. my %lines_seen; if ( -s $save_file ) { open SAVE, '<', $save_file or die "Could not open '$save_file': $!"; while (<SAVE>) { chomp $_; my ( $count, $data_line ) = split "\t", $_, 2; $lines_seen{$data_line} = $count; } close SAVE or warn "Could not close '$save_file': $!"; } # Read the data file one line at a time, using the existence of # the line in %lines_seen to determine if the is a line we have seen. # Print new lines. open DAT, '<', $data_file or die "Could not open '$data_file': $!"; while (<DAT>) { chomp $_; my $is_new = not exists $lines_seen{$_}; if ( $is_new ) { print "Line not seen before: '$_'\n"; } $lines_seen{$_}++; } close DAT or warn "Could not close '$data_file': $!"; # Overwrite the save file with the updated lines and counts. open SAVE, '>', $save_file or die "Could not open '$save_file': $!"; while ( my ( $data_line, $count ) = each %lines_seen ) { print SAVE "$count\t$data_line\n"; } close SAVE or warn "Could not close '$save_file': $!"; -- Hope this helps, Bruce Gray (Util of PerlMonks) -- Hope this helps, Bruce Gray (Util of PerlMonks)
|