CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I need a program that...:
line by line comparison

 



kgbolger
New User

Aug 3, 2005, 6:55 AM

Post #1 of 2 (901 views)
line by line comparison Can't Post

Hi,

I've been working on this script for a couple of days changing it and changing it as you can see I'm new to scripting.

I have log files, really long, loads of lines exactly repeating themselves. Iíve tried to write a script that will for every line of the log, compare it to every entry in another txt file and if it isnít there writes it in.



The idea is obviously to just have each entry once, if possible than Iíd like to get a count on the occurrences of each entry, so if a line is there already than an int would be incremented instead, I need to write this to a txt or excel file.



#!/usr/bin/perl

$data_file="demolog.txt"; // point to file to open

open(DAT, $data_file) || die("Could not open file!"); //open file or show error if not possible

@raw_data=<DAT>; //array raw_data equals contents of demolog

close(DAT); //close the file

@parsed_data; //declare new array



foreach $loglines (@raw_data) //for each line of array raw_data

{

foreach $line (@parsed_data) //for each line of parsed_data array

if ($loglines != $line) //if line from log isnít already in second array do below

{

$line + 1 = $loglines; //go to next line and write entry

}

}

Log file looks like this

"computer1" "192.192.192.60"
"computer1" "192.192.192.60"
"computer1" "192.192.192.60"
"computer2" "192.192.192.60"
"computer1" "192.192.192.60"
"computer3" "192.192.192.60"
"computer1" "192.192.192.60"
"computer1" "192.192.192.50"
"computer1" "192.192.192.60"
"computer2" "192.192.192.60"
"computer1" "192.192.192.60"
"computer1" "192.192.192.60"
"computer1" "192.192.192.50"
"computer1" "192.192.192.60"
"computer2" "192.192.192.60"
"computer1" "192.192.192.60"
"computer1" "192.192.192.60"
"computer1" "192.192.192.60"


I've been trying but to no avail, any help would be great

Thanks

Kev



Util
New User

Oct 25, 2005, 10:39 AM

Post #2 of 2 (835 views)
Re: [kgbolger] line by line comparison [In reply to] Can't Post

Some help with your existing code:
  • Use '#' for comments, instead of '//'.

  • Use 'eq' and 'ne' for string equality, instead of '==' and '!=', which are only for numeric comparison.

  • Always 'use warnings;', so that Perl will tell you when you are using numeric comparison, among other things.

  • This line is nonsensical: '$line + 1 = $loglines; //go to next line and write entry'

  • Your core algorithm is flawed; inside your inner loop 'foreach $line (@parsed_data) {...}', when you see that a single line from @parsed_data is not equal to the current line from @raw_data, that tells you nothing by itself, because the very next line that is about to come from @parsed_data *might* match. Instead, you need to initialize a 'seen' flag to 0, then loop through @parsed_data, setting the seen flag to 1 if any line matches, and then use the seen flag *outside* of the inner loop to trigger any code to operate on 'unseen' lines. Like this:


  • Code
    foreach my $logline (@raw_data) { 
    my $is_already_in_parsed_data = 0;

    PARSED_LOOP:
    foreach my $line (@parsed_data) {
    if ( $logline eq $line ) {
    $is_already_in_parsed_data = 1;
    last PARSED_LOOP;
    }
    }

    if ( not $is_already_in_parsed_data ) {
    print "Line not seen before: '$logline'\n";
    push @parsed_data, $logline;
    }
    }

  • Your core algorithm is inefficient; looping repeatedly over an array to look for an exact match is a red flag to use a hash instead.


  • Code
    my %parsed_data; 
    foreach my $logline (@raw_data) {
    my $is_already_in_parsed_data = exists $parsed_data{$logline};

    if ( not $is_already_in_parsed_data ) {
    print "Line not seen before: '$logline'\n";
    $parsed_data{$logline}++;
    }
    }

  • Here is a (loosely tested) complete program that does what you asked for:


  • Code
    #!/usr/bin/perl 
    use strict;
    use warnings;

    =begin comment

    2005-10-25 Bruce Gray <bruce.gray@acm.org>
    Wrote program.

    This program reads a file, printing the first occurrence of
    each line as it is seen. It dumps a count of the number of
    occurrences of each line into a save file. In subsequent
    runs of the program, the save file is used to initialize the
    count (and therefore the state of whether an occurrence is "first").

    Program written in answer to Perl Guru request:
    http://perlguru.com/gforum.cgi?post=24449;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed;guest=

    =cut


    # Configuration:
    my $data_file = 'c:/demolog.txt';
    my $save_file = 'c:/savelog.txt';

    # Read any counts of data lines from the save file.
    # Place them in %lines_seen.
    my %lines_seen;
    if ( -s $save_file ) {
    open SAVE, '<', $save_file
    or die "Could not open '$save_file': $!";

    while (<SAVE>) {
    chomp $_;
    my ( $count, $data_line ) = split "\t", $_, 2;
    $lines_seen{$data_line} = $count;
    }

    close SAVE
    or warn "Could not close '$save_file': $!";
    }



    # Read the data file one line at a time, using the existence of
    # the line in %lines_seen to determine if the is a line we have seen.
    # Print new lines.
    open DAT, '<', $data_file
    or die "Could not open '$data_file': $!";

    while (<DAT>) {
    chomp $_;

    my $is_new = not exists $lines_seen{$_};

    if ( $is_new ) {
    print "Line not seen before: '$_'\n";
    }

    $lines_seen{$_}++;
    }

    close DAT
    or warn "Could not close '$data_file': $!";



    # Overwrite the save file with the updated lines and counts.
    open SAVE, '>', $save_file
    or die "Could not open '$save_file': $!";

    while ( my ( $data_line, $count ) = each %lines_seen ) {
    print SAVE "$count\t$data_line\n";
    }

    close SAVE
    or warn "Could not close '$save_file': $!";


    --
    Hope this helps,
    Bruce Gray
    (Util of PerlMonks)
    --
    Hope this helps,
    Bruce Gray
    (Util of PerlMonks)

     
     


    Search for (options) Powered by Gossamer Forum v.1.2.0

    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives