CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
need regular expression

 



k2011
Novice

Jun 29, 2011, 9:56 AM

Post #1 of 9 (4234 views)
need regular expression Can't Post

I have these logs and I need to extract information based on conditions. fields are separeted by delimiter ~

//line 1

1297286297 ~1297286297~1297286297~Smpp~25061~http~25061r~AlphaPg~260500004002~UN
DEF~0~false~false~false~0:0:0~0~~0::0::0~0~Default~0~unknown_method~Delivered~~1
:0:0~0:0:0~~~1005~~9015~~~~121~id:500004002 sub:001 dlvrd:000 subm
it date:1102091518 done date:1102091518 stat:DELIVRD err:000 Text:hello world

#line 2 ...

#line3 ...

conditions are

1. fifth field should always be http - case insensitive (in above example http)

2. sixth field should not be 8080 (in above example 25061r)

3. 28th field should always be a 3,4,5,6,7, 8,9,12 digit number (in above example 1005)

4. 30 field should always be a 3,4,5,6,7, 8,9,12 digit number (in above example 9015)



I need to extarct 28 th field and 30th field



I have tried some thing like this



my_split = '^(?:[^~]*~){5}http~(?:[^~]*~){1}(?!8080)~(?:[^~]*~){22}(?:(\d{3,4,5,6,8,9,12})~|[^~]+~[^~]*~(?:(\d{3,4,5,6,8,9,12})))'



I am not getting desired result


miller
User

Jun 29, 2011, 3:05 PM

Post #2 of 9 (4232 views)
Re: [k2011] need regular expression [In reply to] Can't Post

Just use an actual split command instead of doing all that additional validation:


Code
my @fields = split '~', $line; 
my ($x,$y) = @fields[27,29];


- Miller


k2011
Novice

Jun 30, 2011, 7:35 AM

Post #3 of 9 (4224 views)
Re: [miller] need regular expression [In reply to] Can't Post

Miller,

Thanks for reply. but I always need to validate that fifth field is http and eightj field is not 8080 when extracting the the 28 and 30 th fields


FishMonger
Veteran / Moderator

Jun 30, 2011, 8:36 AM

Post #4 of 9 (4216 views)
Re: [k2011] need regular expression [In reply to] Can't Post

Using a regex of that nature (even if it worked) would be very fragile.

I agree with miller, you should use the split function to extract the 4 fields and then apply your checks on the 5th, and 6th fields.

I'd need more context on what you're doing in the script, but maybe something like this.

Code
my ($fld5, $fld6, $fld28, $fld30) = (split(/~/, $line))[4,5,27,29]; 

next if lc($fld5) ne 'http';
next unless ( (length $fld6 == 12) or (length($fld6) > 2 and length($fld6) < 10) );



k2011
Novice

Jun 30, 2011, 9:30 AM

Post #5 of 9 (4212 views)
Re: [FishMonger] need regular expression [In reply to] Can't Post

thanks Fish Monger,

Context is I am trying to parse through the lines in a log in a loop and collect 27 and 29 th fields. I want to put some counters and Logs are very large from 2 GB to 6 GB. Idea is put the mined data into some oracle control loader files and load into database



I was trying to do this

open(line,$cmdstring . ' |' )|| die "Probem .."

#$cmdstring is a invoking a C++ program which produces aforementioned logs

my_split = '^(?:[^~]*~){5}http~(?:[^~]*~){1}(?!8080)~(?:[^~]*~){22}(?:(\d{3,4,5,6,8,9,12})~|[^~]+~[^~]*~(?:(\d{3,4,5,6,8,9,12})))'


my (%seen, %sent, %rcvd);

while (<line>) {

/$new_split/o or next;

my ($var1, $var2) = ($3, $4); // here i am trying to get the 28 and 30 fields whihc are $var1 and $var2

/// some more business logic here...

$var1 && do {$sent{$va1}++; next;};

$var2 && do {$rcvd{$var2}++;};


}


FishMonger
Veteran / Moderator

Jun 30, 2011, 9:37 AM

Post #6 of 9 (4211 views)
Re: [k2011] need regular expression [In reply to] Can't Post

When I said I needed more context, I meant that I need to see the actual code block for this section of your script so that we can see how to properly implement the split approach and get rid of that messy and error prone regex.

I might be able to extract what I need from that minimal sample you gave, but it would be easier if you posted a more complete sample.


(This post was edited by FishMonger on Jun 30, 2011, 9:38 AM)


k2011
Novice

Jun 30, 2011, 10:32 AM

Post #7 of 9 (4200 views)
Re: [FishMonger] need regular expression [In reply to] Can't Post

 
basically Seach log is a c++ executable which mines a datalogs when invoked
/searchlog -o datalogs
produces logs like this



Code
 
//line 1

1297286297~1297286297~1297286297~Smpp~25061~http~25061r~AlphaPg~260500004002~UNDEF~0~false~false~false~0:0:0~0~~0::0::0~0~Default~0~unknown_method~Delivered~~1:0:0~0:0:0~~~1005~~9015~~~~121~id:500004002 sub:001 dlvrd:000 subm it date:1102091518 done date:1102091518 stat:DELIVRD err:000 Text:hello world

//line 2

/line 3
etc etc





my $infile = '/data/datalogs';
my $search = '/source/searchlog'; # C++ executable
my $svcname = 'ver10';
my $out_dir = '/homes/outputfiles';
my $records = 0; # count of total records
my $found = 0; # count of codes found;
my $file_base ='EmailStats';
my $out_file;



sub mine_a_log {
my ($date, $in, $out) = @_;
my $records = 0; # local scope
my $found = 0; # local scope
open(my $outfh, "> $out") or warn "Couldn't open $out for writing: $!\n" and return;
print $outfh <<ESQL;
options(silent=(header,feedback))\nload data\ninfile *
append into table Email_Stats\nfields terminated by ',' (
hostnameName constant "$host",
svcName constant "$svcname",
createTime EXPRESSION "to_date('$date','YYYY-Mon-DD')",
MCode,\n ToEmail,\n FromEmail\n)\nbegindata
ESQL

my $cmd_string = "$search -o $infile";
my_split = '^(?:[^~]*~){5}http~(?:[^~]*~){1}(?!8080)~(?:[^~]*~){22}(?:(\d{3,4,5,6,8,9,12})~|[^~]+~[^~]*~(?:(\d{3,4,5,6,8,9,12})))

print "-- executing $cmd_string\n" ;#if $verbose;
open( LINE, $cmd_string . ' |' ) || die "Problem: can't fork command ($cmd_string) $! $?\n";

my (%seen, %sent, %rcvd);
while (<LINE>) { #loop
$records++;

/$new_split/o or next;

my ($source, $dest) = ($3, $4);

$source && do {$sent{$source}++; next;}; # if source is e- code, then dest can't be
$dest && do {$rcvd{$dest}++;};

}
for (keys %sent) {@{$seen{$_}}= ($sent{$_}, 0)}
for (keys %rcvd) {$seen{$_}[1]= $rcvd{$_}}

# print all found codes
for my $sc (sort {$a<=>$b} keys %seen) { # numerically sorted for human readability
print $outfh (join ',',$sc,@{$seen{$sc}}).$/; #code, plus array in csv format
}

close LINE || warn "Problem while closing datalog piped read: $! $?\n";
close $outfh;
return $records, $found;
}



FishMonger
Veteran / Moderator

Jun 30, 2011, 11:24 AM

Post #8 of 9 (4194 views)
Re: [k2011] need regular expression [In reply to] Can't Post

This is untested, but is the direction I'd take.


Code
my %valid_length = map { $_, 1 } ( 3,4,5,6,7,8,9,12 ); 

while (<LINE>) { #loop

my ($fld5, $fld6, $fld28, $fld30) = (split(/~/, $_))[4,5,27,29];
next if (
lc($fld5) ne 'http'
or not $valid_length{ $fld6 }
);



k2011
Novice

Jul 8, 2011, 6:42 AM

Post #9 of 9 (4015 views)
Re: [FishMonger] need regular expression [In reply to] Can't Post

Thanks FishMonger

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives