CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
awk , uniq , sort , grep

 



ozdersin
Novice

Jul 21, 2009, 6:14 AM

Post #1 of 15 (2155 views)
awk , uniq , sort , grep Can't Post

Hi ,

I ve a bash script and i want to convert it to perl..
Firstly i wonder which can run below query faster ? bash or perl ? My file is very very big and calculation is taking so long time. I hope making a perl sciprt decrease the calculation time..

cat file | awk '{print$5"\t"$3}' | sort -n | uniq -c | grep -v $application | sort -n


i used that code for uniq. But i couldnt find anything instead all of them together..

Thanks for helping.


open DATA, "$data_file" ;
my %uniq;
while(<DATA>) {
$uniq{ $_ }++;
}
close $fh;


FishMonger
Veteran / Moderator

Jul 21, 2009, 7:22 AM

Post #2 of 15 (2152 views)
Re: [ozdersin] awk , uniq , sort , grep [In reply to] Can't Post

Can you post some sample lines from your data file?

How big is the file?

Which field holds the $application and what is its value? Please include that line in the sample data.

I don't see the need for the second sort.

Do you need the count of duplicates?

What is the exact format you want for the output?

This is what I'd start with and probably adjust it after your answers to my requests.

Code
#!/usr/bin/perl 

use strict;
use warnings;

my $data_file = $ARGV[0];
my $application = $ARGV[1];
my %uniq;

open my $fh, '<', $data_file or die "failed to open '$data_file' $!";
while( <$fh> ) {
my ($f3, $f5) = (split(/ /, $_))[3,5];
$uniq{ "$f5 \t $f3" }++ unless "$f5 $f3" =~ /$application/;
}
close $fh;

foreach my $key ( sort keys %uniq ) {
print "$uniq{$key} : $key\n";
}



ichi
User

Jul 21, 2009, 7:54 AM

Post #3 of 15 (2148 views)
Re: [ozdersin] awk , uniq , sort , grep [In reply to] Can't Post


In Reply To
Hi ,

I ve a bash script and i want to convert it to perl..
Firstly i wonder which can run below query faster ? bash or perl ? My file is very very big and calculation is taking so long time. I hope making a perl sciprt decrease the calculation time..

cat file | awk '{print$5"\t"$3}' | sort -n | uniq -c | grep -v $application | sort -n


i used that code for uniq. But i couldnt find anything instead all of them together..

Thanks for helping.


open DATA, "$data_file" ;
my %uniq;
while(<DATA>) {
$uniq{ $_ }++;
}
close $fh;


your script is slow because you are using too many redundant tools and too much pipes increase your overheads, especially since your file is very big. Even if you use Perl, there will not be significant increase in speed. it happens with what other languages you use.

you should show samples of what your file looks like, then show your final output you want.


ozdersin
Novice

Jul 22, 2009, 8:16 AM

Post #4 of 15 (2109 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post

data :

DHCPREQUEST for 132.24.15.4 from 00:ff:24:14:30:70


cat file | awk '{print$5"\t"$3}' | sort -n | uniq -c | grep -v $application | sort -n

awk: takes ipaddress and macaddress
1. sort -n: sort lists. ( if you dont sort here uniq -c commands can calculate wrong values )
2. grep -v $application: it's just a string . Generelly its a null variable so it doesnt take an important and doesnt killed so much time ( sometimes i need the exclude a few datas )
3. sort -n (its necesarry becuase after running uniq -c your data seems like
3 00:ff:24:14:30:70 132.24.15.4
so i want to sort which mac-address has higher repated value
(previously i sorted mac address only )

I'm not perl expert. But i know unix run this commands in order.
I mean first grep all the file and then sort all the file again and then count uniq lines.. etc .. It means that you're opening and and reading from begining to end a 4gb file for each pipe thats why my script is too slow..


Maybe I could manage with perl.. to count uniq line at the same time sorting and grepping..


FishMonger
Veteran / Moderator

Jul 22, 2009, 8:50 AM

Post #5 of 15 (2105 views)
Re: [ozdersin] awk , uniq , sort , grep [In reply to] Can't Post

Will the mac addresses always be requesting the same IP (i.e., static dhcp) or is it dynamic where one entry for 00:ff:24:14:30:70 requests 132.24.15.4 and another entry for that mac might request 132.24.15.54?

Do you have enough memory to load the entire file?


(This post was edited by FishMonger on Jul 22, 2009, 8:53 AM)


ozdersin
Novice

Jul 22, 2009, 11:37 AM

Post #6 of 15 (2100 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post

All bindings are static. so one mac address can only match with single ip address.



Thanks...


FishMonger
Veteran / Moderator

Jul 22, 2009, 12:11 PM

Post #7 of 15 (2097 views)
Re: [ozdersin] awk , uniq , sort , grep [In reply to] Can't Post

There are several of my clarification questions that are still unanswered, so I'll make assumptions on those items.

Since $application is null in most cases, I'll leave that out for now.

You didn't say if you need a count on the duplicate entries, so I'll leave it out.

You didn't say how much memory you have, but I'll assume you can load the whole file into memory.

This comes down to a very slightly modified version of my first post

Code
open my $fh, '<', $data_file or die "failed to open '$data_file' $!";  
while( <$fh> ) {
my ($mac, $ip) = (split(/ /, $_))[3,5];
$uniq{ $mac } = $ip;
}
close $fh;

foreach my $mac ( sort keys %uniq ) { # sorts by mac addresses lexically

# if you need to sort numerically, do this instead
# foreach my $mac ( sort {$a <=> $b} keys %uniq )

print "$mac \t $uniq{$mac}\n";
}



(This post was edited by FishMonger on Jul 22, 2009, 12:13 PM)


ozdersin
Novice

Jul 22, 2009, 1:02 PM

Post #8 of 15 (2092 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post

Thanks for helping . In answer your questions .


Since $application is null in most cases, I'll leave that out for now.
> Ok , you can leave that out ..
You didn't say if you need a count on the duplicate entries, so I'll leave it out.
> Yes, i definitly need a counter...I must know which mac address has how many request.
You didn't say how much memory you have, but I'll assume you can load the whole file into memory.

> Yes , i can.

This comes down to a very slightly modified version of my first post

Code



      
    


FishMonger
Veteran / Moderator

Jul 22, 2009, 1:41 PM

Post #9 of 15 (2088 views)
Re: [ozdersin] awk , uniq , sort , grep [In reply to] Can't Post


Code
open my $fh, '<', $data_file or die "failed to open '$data_file' $!";  
while( <$fh> ) {
my ($mac, $ip) = (split(/ /, $_))[3,5];
$uniq{ $mac }{'ip'} = $ip;
$uniq{ $mac }{'count'}++;
}
close $fh;

foreach my $mac ( sort keys %uniq ) { # sorts by mac addresses lexically

printf("%-s\t%s\t%d\n", $mac, $uniq{$mac}{'ip'}, $uniq{$mac}{'count'});
}



ozdersin
Novice

Jul 23, 2009, 2:03 AM

Post #10 of 15 (2083 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post

Thanks a lot .. this code works very cool..


biggnou
Novice


Aug 5, 2009, 12:25 PM

Post #11 of 15 (1952 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post


In Reply To

Code
open my $fh, '<', $data_file or die "failed to open '$data_file' $!";  
while( <$fh> ) {
my ($mac, $ip) = (split(/ /, $_))[3,5];
$uniq{ $mac }{'ip'} = $ip;
$uniq{ $mac }{'count'}++;
}
close $fh;

foreach my $mac ( sort keys %uniq ) { # sorts by mac addresses lexically

printf("%-s\t%s\t%d\n", $mac, $uniq{$mac}{'ip'}, $uniq{$mac}{'count'});
}



Sorry to bother but I have a question here :

since the default for split is whitespace on $_, can we rewrite this line :

my ($mac, $ip) = (split(/ /, $_))[3,5];

as :

my ($mac, $ip) = split [3,5];

?
----------------------------------------------------------------------------------------


#!/usr/bin/perl
$_="Grande est la binouze !";
s/(G)(?:rande est la )(bi)(nou)(ze )(!)/\u$4\u$2\L$1$1$3 $5/;
print "$_\n";


FishMonger
Veteran / Moderator

Aug 5, 2009, 1:17 PM

Post #12 of 15 (1948 views)
Re: [biggnou] awk , uniq , sort , grep [In reply to] Can't Post

Close, but you're missing a set of parens

Code
my ($mac, $ip) = (split)[3,5];


However, I prefer to be more explicit and not depend on the 2 separate defaults ($_ and the split pattern) which a new Perl programmer may not know about.

So, in production code, I'd be specifying the pattern and use a named var instead of $_.

So, I'd normally do this:

Code
my ($mac, $ip) = (split(/ /, $line))[3,5];


I've seen people do this:

Code
@fields = (split)[3,5];

which has the added problem of not knowing what kind of data you're processing.


KevinR
Veteran


Aug 5, 2009, 1:29 PM

Post #13 of 15 (1946 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post


In Reply To
Close, but you're missing a set of parens

Code
my ($mac, $ip) = (split)[3,5];


However, I prefer to be more explicit and not depend on the 2 separate defaults ($_ and the split pattern) which a new Perl programmer may not know about.

So, in production code, I'd be specifying the pattern and use a named var instead of $_.

So, I'd normally do this:

Code
my ($mac, $ip) = (split(/ /, $line))[3,5];


I've seen people do this:

Code
@fields = (split)[3,5];

which has the added problem of not knowing what kind of data you're processing.


I agree with your thinking, but the default split and split(/ /) don't do the same thing. Might or might not be important.
-------------------------------------------------


FishMonger
Veteran / Moderator

Aug 5, 2009, 1:40 PM

Post #14 of 15 (1944 views)
Re: [KevinR] awk , uniq , sort , grep [In reply to] Can't Post


In Reply To
I agree with your thinking, but the default split and split(/ /) don't do the same thing. Might or might not be important.


You're right, I was thinking of split(' ') which is the same as split().


biggnou
Novice


Aug 5, 2009, 5:58 PM

Post #15 of 15 (1931 views)
Re: [FishMonger] awk , uniq , sort , grep [In reply to] Can't Post

Thank you FishMonger and KevinR.

I will take a look at why the default split is like split(' ') and therefore different from split(/ /). I found some kind of explanation at http://perldoc.perl.org/functions/split.html...

It looks strange to the beginner though... Angelic
----------------------------------------------------------------------------------------


#!/usr/bin/perl
$_="Grande est la binouze !";
s/(G)(?:rande est la )(bi)(nou)(ze )(!)/\u$4\u$2\L$1$1$3 $5/;
print "$_\n";

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives