CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Regular expression for IP address and hostnames

 



jeffersno1
Novice

Mar 9, 2012, 3:35 AM

Post #1 of 11 (1485 views)
Regular expression for IP address and hostnames Can't Post

I'm trying to run some daily stats on our DNS servers and i've come across a few issues.

1 - each file is 20MB in size and there are 200 of them, what the best approach in getting these stats?
- run the script on 1 file at a time and increment on a counter? How would i do this?
- add them all to one file and run a script against a 2GB file - not too keen on this idea buts its easier, Would appreciate some recommendations


I've started to pull out the IP addresses and domain names (thats all im interested in) but again i'm stuck on the hostnames...
How can i write a regular expression to pull out the various types of domains looked up.
Some of the lookups are for sub domains which makes this extra difficult

Is there a way i can grab everything after the query:spaceHOSTNAME and up to the spaceIN ?

here is my script so far... I would have thought its easy to pull this out in a 1 liner as i've started below


Code
#!/usr/bin/perl 
use POSIX;

my $dns_txt = '/home/otpuser/dns.txt';
open (INFILE, "$dns_txt" || die "cant open $!\n");

while ($line = <INFILE>)
{
chomp($line);
$line =~ s/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/$1$2$3$4/;
print "$1.$2.$3.$4\n";
}


sample file
08-Mar-2012 21:29:40.442 client 89.97.48.135#57441: query: ax.phobos.apple.com.edgesuite.net IN A +
08-Mar-2012 21:29:40.442 client 89.216.38.220#35382: query: api-read.facebook.com IN A +
08-Mar-2012 21:29:40.442 client 11.101.226.155#52914: query: a3.da1.akamai.net IN A +
08-Mar-2012 21:29:40.442 client 240.109.217.199#53121: query: activesync.hot.glbdns.microsoft.com IN A +
08-Mar-2012 21:29:40.442 client 11.109.166.79#57543: query: emea.rel.msn.com IN AAAA +
08-Mar-2012 21:29:40.442 client 222.100.209.183#39854: query: apps.facebook.com IN A +
08-Mar-2012 21:29:40.443 client 52.103.120.15#35868: query: breakingnews.com IN A +
08-Mar-2012 21:29:40.443 client 123.123.94.179#57979: query: api.facebook.com IN A +
08-Mar-2012 21:29:40.443 client 88.103.24.95#63351: query: apps.skype.com IN AAAA +
08-Mar-2012 21:29:40.443 client 89.159.54.118#65462: query: www.facebook.com IN A +


Many thanks

Jeffers


naven8
Novice

Mar 9, 2012, 5:38 AM

Post #2 of 11 (1479 views)
Re: [jeffersno1] Regular expression for IP address and hostnames [In reply to] Can't Post

>> run the script on 1 file at a time and increment on a counter? How would i do this?
I will fork the jobs and join it at the end. Maybe you can make use of threads
[Is there any other way to do it?]
>>Is there a way i can grab everything after the query:spaceHOSTNAME and up to the spaceIN ?
I didn't understand this.


I hope the following will work.


Code
 
$line =~ s/.*?client\s(\d+)\.(\d+).(\d+).(\d+)#.*/$1$2$3$4/;
Or
$line =~ s/.*?client\s(\d{1,3})\.(\d{1,3}).(\d{1,3}).(\d{1,3})#.*/$1$2$3$4/;



(This post was edited by naven8 on Mar 9, 2012, 5:40 AM)


rovf
Veteran

Mar 9, 2012, 6:26 AM

Post #3 of 11 (1472 views)
Re: [jeffersno1] Regular expression for IP address and hostnames [In reply to] Can't Post


Quote
I'm trying to run some daily stats on our DNS servers and i've come across a few issues.


In this case, I strongly suggest making a separate posting for each issue - the discussions are easier to track.


Quote
each file is 20MB in size and there are 200 of them, what the best approach in getting these stats?
- run the script on 1 file at a time and increment on a counter? How would i do this?
- add them all to one file and run a script against a 2GB file - not too keen on this idea buts its easier,


Certainly the former, i.e. looping over the files (though I don't see why you need a counter). If you put together all the files into a big one, you loose the information about the individual files. Even if you put together all the files first, you would have a loop (for concatenating the files).



Quote
Is there a way i can grab everything after the query:spaceHOSTNAME and up to the spaceIN ?


Since your input data seems to be highly regular, consider doing a


Code
# Not tested, adjust the indices if necessary 
my @fields=(split(/(#|:\s+|\s+)/,$line))[3,6];


instead.


jeffersno1
Novice

Mar 9, 2012, 9:43 AM

Post #4 of 11 (1468 views)
Re: [rovf] Regular expression for IP address and hostnames [In reply to] Can't Post

Hi rovf,

That is brilliant, now i have the list of IP's and full domains including sub domains. I never thought about splitting the fields that way and storing into an array, perfect idea.

Just one issue - When i try and use the array outside the loop only the last element of the array exists, but if i print the scalar outside it exists. Ideally i wanted to count the most looked ip hostname and most used client IP.



Code
#!/usr/bin/perl 
use POSIX;
my $dns_txt = '/home/otpuser/dns.txt';

open (INFILE, "$dns_txt" || die "cant open $!\n");

my @fields=<INFILE>;

chomp @fields;

foreach $line (@fields)
{
@ip=(split(/(#|:\s+|\s+)/,$line))[6,12];
print "@ip\n";
}
print "----------\n";
print scalar @fields, " scalar fields \n";
print scalar @ip, " scalar ip \n";


script output
load of ip and hosts.....
10.192.165.238 disc.yourwebapps.com
31.110.115.57 ssl.google-analytics.com
----------
22 scalar fields
2 scalar ip
----------

How can i use the array @ip outside the loop? or can i count inside the loop?

Many thanks for your help guys


FishMonger
Veteran / Moderator

Mar 9, 2012, 10:25 AM

Post #5 of 11 (1463 views)
Re: [jeffersno1] Regular expression for IP address and hostnames [In reply to] Can't Post

Use a hash to maintain the count.


Code
#!/usr/bin/perl 

use strict;
use warnings;

my $dns_txt = '/home/otpuser/dns.txt';
open my $dns_fh, '<', $dns_txt or die "cant open $!\n";

my (%ip, %host);
while (my $line = <$dns_fh>) {
my ($ip, $host) = (split(/(#|:\s+|\s+)/,$line))[6,12];
$ip{$ip}++;
$host{$host}++;
}
close $dns_fh;

# output the counts as needed



rovf
Veteran

Mar 9, 2012, 12:05 PM

Post #6 of 11 (1457 views)
Re: [jeffersno1] Regular expression for IP address and hostnames [In reply to] Can't Post


Quote
When i try and use the array outside


Which array? You have two in your code. And what "scalar" you are refering to? Note that if you can apply the scalar function to an array, you can - of course! - access the whole array, and not only the last element of the array.

BTW, before we continue discussing this, I strongly suggest that you put


Code
use strict; 
use warnings;


in your code (and fix the problems which will arise then).


jeffersno1
Novice

Mar 9, 2012, 2:42 PM

Post #7 of 11 (1450 views)
Re: [rovf] Regular expression for IP address and hostnames [In reply to] Can't Post

Hi Rovf

Thanks for responding, I did mention what array i was trying to access. I attempted a print outside the loop and got the following: Just wondered how i can access the array inside the loop from outside the loop, ill put the strict and warnings in...

From previous post
22 scalar fields
2 scalar ip
----------

How can i use the array @ip outside the loop? or can i count inside the loop?

Many thanks for your help guys


jeffersno1
Novice

Mar 9, 2012, 3:00 PM

Post #8 of 11 (1448 views)
Re: [FishMonger] Regular expression for IP address and hostnames [In reply to] Can't Post

Hi FishMonger,

Thanks for replying,

from just one dns server and one 3 minute 20MB file i can see the following:

www.google.com 6755
api-read.facebook.com 6396
orcart.facebook.com 3881
api.facebook.com 3835
www.facebook.com 3828
m.facebook.com 3201
m.hotmail.com 2973
profile.ak.fbcdn.net 2321
fbcdn-profile-a.akamaihd.net 2098 and so on . . .

I added the following sort code below to get the sites sorted in order.

Code
foreach $value (sort {$host{$b} <=> $host{$a} } keys %host) 
{
print "$value $host{$value}\n";
}


When finding the total number of elements in an array i would normally print something like

Code
print scalar @ip;

How would i get the total number of entries from a hash?

and just for checking purposes how can i print out the total number of domains looked up. I guess i would use the $value somehow?

Many thanks

Jeffers


rovf
Veteran

Mar 10, 2012, 5:19 AM

Post #9 of 11 (1415 views)
Re: [jeffersno1] Regular expression for IP address and hostnames [In reply to] Can't Post

The way you wrote it, @ip *is* accessible outside the loop (otherwise you could not apply 'scalar' to it).


FishMonger
Veteran / Moderator

Mar 10, 2012, 8:16 AM

Post #10 of 11 (1407 views)
Re: [jeffersno1] Regular expression for IP address and hostnames [In reply to] Can't Post


Quote
How would i get the total number of entries from a hash?

and just for checking purposes how can i print out the total number of domains looked up.


Simple:

Code
my $ip_cnt = keys %ip; 
print "Total number of unique ip addresses:$ip_cnt\n";

my $host_cnt = keys %host;
print "Total number of unique hosts:$host_cnt\n";



jeffersno1
Novice

Mar 11, 2012, 3:55 PM

Post #11 of 11 (1367 views)
Re: [FishMonger] Regular expression for IP address and hostnames [In reply to] Can't Post

Thanks FishMonger

thats worked a treat Wink

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives