CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Stuck on how to iterate a list in a file against a logfile.

 



asandybox
Novice

Jun 18, 2012, 5:32 PM

Post #1 of 14 (8251 views)
Stuck on how to iterate a list in a file against a logfile. Can't Post

Hi Perl guru's,

I am stuck on trying to iterate a list of IP addresses against a squid log file. Below is a snippet of what I have tried with no luck.


Code
#!/usr/bin/perl 
use warnings;
use strict;


my $file = "ip.txt";
open (FH, "< $file") or die "Can't open $file for read: $!";
my @lines;
while (<FH>) {
push (@lines, $_);
}
close FH or die "Cannot close $file: $!";



while (<>) {
for my $line (@lines){
print $_;
if ($_ eq $line) {
print "$_";
}
}
}


ip.txt contains a list of ip addresses:

192.168.3.10
192.168.1.20
192.168.3.0
....

My thought was to load all the IP addresses into an array, use the while loop to pull in the access.log file and then run each item of the array I created against the log file.

I am just not getting how to loop through the log file and then match events off my list against the file.

Any assistance or pointers on what I am doing wrong is appreciated.


rovf
Veteran

Jun 19, 2012, 1:20 AM

Post #2 of 14 (8225 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

This depends on *how* you define a match. You did not explain in your posting, what you call a match: For example, are you interested in those lines in the logfile which contain one of the ip addresses somewhere, or lines which consist only of one of the ip addresses, or lines having all the ip addresses, etc.


BTW, the code


Code
my @lines;  
while (<FH>) {
push (@lines, $_);
}


can be written more concisely as


Code
my @lines=<FH>;



asandybox
Novice

Jun 19, 2012, 4:53 AM

Post #3 of 14 (8220 views)
Re: [rovf] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

Thanks rovf,

Here is what I want to match. Basically I would like to return any line that contains the IP on the list.

1265939281.764 1 172.16.167.228 TCP_DENIED/403 734 POST http://lbcore1.metacafe.com/test/SystemInfoManager.php - NONE/- text/html
1265939281.764 1 192.168.2.20 TCP_DENIED/403 734 POST http://lbcore1.metacafe.com/test/SystemInfoManager.php - NONE/- text/html
1265939281.764 1 172.16.167.228 TCP_DENIED/403 734 POST http://lbcore1.metacafe.com/test/SystemInfoManager.php - NONE/- text/html
1265939281.764 1 192.168.3.0 TCP_DENIED/403 734 POST http://lbcore1.metacafe.com/test/SystemInfoManager.php - NONE/- text/html

I ran the script through the squid log file and did not get any returns on any lines matching the IP's on the list.


(This post was edited by asandybox on Jun 19, 2012, 4:55 AM)


rovf
Veteran

Jun 19, 2012, 5:33 AM

Post #4 of 14 (8214 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

In this case, I would make a regexp like this (untested!):


Code
my $regexp='\\b('.join '|',(map { quotemeta($_) } @lines).')\\b';


The idea here is to dynamically build a regexp like this:


Code
/\b(172\.16\.167\.228|192\.168\.2\.20)\b/


quotemeta takes care about the escaping of the dots, and the join squeezes the vertical bars in between.


FishMonger
Veteran / Moderator

Jun 19, 2012, 6:40 AM

Post #5 of 14 (8209 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

It would be much more efficient to load the IP addresses into a hash instead of an array and then use an array slice to extract the IP address from the log entry which is then used to do a simple lookup in the hash.


asandybox
Novice

Jun 19, 2012, 8:00 AM

Post #6 of 14 (8201 views)
Re: [rovf] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

Thanks rovf, here is my attempt at combining your code:


Code
 
#!/usr/bin/perl
use warnings;
use strict;

my $file = "ip.txt";
open (FH, "< $file") or die "Can't open $file for read: $!";
my @lines=<FH>;
close FH or die "Cannot close $file: $!";

my $regexp='\\b('.join '|',(map { quotemeta($_) } @lines).')\\b';


while (<>) {
for my $line (@lines){
#if ($_ =~ m/.*$line.*/g) {
if ($_ =~ m/$regexp/g) {
print $_;
}
}
}


Still no luck. So what I am getting is the regex is the issue. I still don't get why something like:

if ($_ =~ m/.*$line.*/g)

Won't just print any lines with an IP followed by anything around it?


(This post was edited by asandybox on Jun 19, 2012, 8:12 AM)


FishMonger
Veteran / Moderator

Jun 19, 2012, 8:18 AM

Post #7 of 14 (8195 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post


Code
#!/usr/bin/perl 

use strict;
use warnings;

my %ip;
my $file = 'ip.txt';
open my $fh, '<', $file or die "failed to open '$file' $!";

while ( my $ip = <$fh> ) {
chomp $ip;
$ip{$ip}++;
}
close $fh;


while ( <> ) {
my $ip = (split(/\s/, $_))[2];
if ( exists $ip{$ip} ) {
print;
}
}



asandybox
Novice

Jun 19, 2012, 8:53 AM

Post #8 of 14 (8189 views)
Re: [FishMonger] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

Thanks Fishmonger,

Still not getting any hits. Here is some sample data:


Code
 
1323666001.661 172 192.168.1.47 TCP_MISS/200 3625 GET http://help.example.com.org/search/sss? - DIRECT/208.82.238.129 text/html
1323666001.985 238 192.168.2.47 TCP_MISS/304 237 GET http://www.example.com.org/styles/craigslist.css? - DIRECT/208.82.238.130 -
1323666002.310 165 192.168.3.47 TCP_CLIENT_REFRESH_MISS/304 237 GET http://www.example.com.org/js/jquery-1.4.2.js - DIRECT/208.82.238.130 -
1323666002.683 158 192.168.4.47 TCP_CLIENT_REFRESH_MISS/304 237 GET http://www.example.com.org/js/toChecklist.js - DIRECT/208.82.238.130 -
1323666002.999 164 192.168.5.47 TCP_CLIENT_REFRESH_MISS/304 235 GET http://www.example.com.org/js/jquery.form-defaults.js - DIRECT/208.82.238.130 -
1323666003.308 165 192.168.6.47 TCP_MISS/304 237 GET http://www.example.com.org/js/tocs.js? - DIRECT/208.82.238.130 -
1323666003.656 161 192.168.7.63 TCP_MISS/301 404 GET http://g.msn.com/1ewenus50/news7? - DIRECT/207.46.216.54 -
1323666003.991 12 192.168.8.63 TCP_MISS/200 1168 GET http://rss.msnbc.msn.com/id/3054049/device/rss? - DIRECT/205.128.86.254 application/rss+xml
1323666006.596 4 192.168.12.30 TCP_HIT/200 213 HEAD ftp://anonymous@ftp.example.com - NONE/- text/html


and ip.txt looks like:


Code
192.168.1.47 
192.168.2.47
192.168.3.47
192.168.4.47
192.168.5.47
192.168.6.47
192.168.7.63
192.168.8.63
192.168.12.30


Here is what I think occurring with your code.

1. Your loading the ip addresses into a hash called "ip"
2. looping through the list to strip of any new lines.

The next while loop allows us to to parse the input log file.

I am unsure why you are splitting here:

my $ip = (split(/\s/, $_))[2];

The remainder of the code is checking if the key value is in the data to be parsed.

However, when I run

./mungelog.pl sample.txt

I get no results. However clearly the IP addresses in the ip.txt file are present in the log file.

Thanks again for all the help on this.


(This post was edited by asandybox on Jun 19, 2012, 8:55 AM)


FishMonger
Veteran / Moderator

Jun 19, 2012, 9:27 AM

Post #9 of 14 (8184 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

Your earlier post indicated that each field in the log file was separated by a single space. This last post indicates otherwise.

I'm using an array slice on the split to extract the IP address so we can do the hash lookup. The split statement will need a slight adjustment.


Code
my $ip = (split(/\s+/, $_))[2];

or

Code
my $ip = (split(' ', $_))[2];



asandybox
Novice

Jun 19, 2012, 11:42 AM

Post #10 of 14 (8173 views)
Re: [FishMonger] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

Thanks fishmonger, here is another wrench in the spoke. Your example goes of a single field in this case only field2.

My original plan was to just craft some type of regex to parse the entire file to see if an "ip" or "url" existed any where in the log file.

Ex:

Field 6:

http://help.example.com.org/search/sss?

What if we wanted to match those url's and they were also in ip.txt. Another concern would not being able to parse the 8th field. "DIRECT/208.82.238.129" which also contains an ip.

I attempted to modify your code below:


Code
 

#!/usr/bin/perl

use strict;
use warnings;

my %ip;
my $file = 'ip.txt';
open my $fh, '<', $file or die "failed to open '$file' $!";

while ( my $ip = <$fh> ) {
chomp $ip;
$ip{$ip}++;
}



close $fh;


while ( <> ) {
my $ip = (split(/\s+/, $_))[2];

if ( exists $ip{$ip} ) {
print;
}

my $url = (split(/\s+/, $_))[6];
#print "$url\n";

if ($url =~ /help.$ip/) {
print;
}

}


To key off some of the other fields and perhaps introduce a regex to match www.* in the log.

Any ideas?

Thanks


(This post was edited by asandybox on Jun 19, 2012, 11:48 AM)


asandybox
Novice

Jun 19, 2012, 11:48 AM

Post #11 of 14 (8171 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

I attempted to use the following:


Code
  if ($url =~ /help.$ip/) {  
print;
}


Because I figured what if I tried to key off the 6th field and match:

http://help.example.com.org/search/sss?

This way I can just add help.example.com to my ip.txt file and the for loop would add it into the list of items to search.


(This post was edited by asandybox on Jun 19, 2012, 11:49 AM)


asandybox
Novice

Jun 19, 2012, 2:47 PM

Post #12 of 14 (8160 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

Guys,

After some head banging, hair pulling, I think I have a workable solution:

I went back to the drawing board with the hint that "rovf" provided and came up with the following:


Code
 
#!/usr/bin/perl
use warnings;
use strict;

my $file = "ip.txt";
open (FH, "< $file") or die "Can't open $file for read: $!";
my @lines=<FH>;
chomp @lines;
my $regexstring = join("|",@lines);
#print "($regexstring)";

while (<>) {
my $line = $_;

if ($line =~ /($regexstring)/i) {
print;
}
}

close FH or die "Cannot close $file: $!";


Appears to do the job. Don't think it's very efficient, but it works for the task at hand and I can start to munge our proxy
logs for some needed information. Thank you FishMonger and Rovf for your inputs. Very helpful for me to finally figuring things
out.


(This post was edited by asandybox on Jun 19, 2012, 2:48 PM)


FishMonger
Veteran / Moderator

Jun 19, 2012, 3:03 PM

Post #13 of 14 (8156 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post

That's not efficient and it doesn't scale well, but if it meets your current needs, then it should be ok until you need to worry about the scaling.


rovf
Veteran

Jun 20, 2012, 1:22 AM

Post #14 of 14 (8127 views)
Re: [asandybox] Stuck on how to iterate a list in a file against a logfile. [In reply to] Can't Post


Quote
join("|",@lines)


Without using quotemeta, as in my original example above, a line such as 1.2.3.4 in ip.txt would also match the input line

"xxxxxx 11223344 yyyyyyyyyy"

which obviously does NOT contain an ip address.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives