CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Processing Large files, Regular Expression taking long

 



sudheer
New User

Jun 7, 2006, 3:37 PM

Post #1 of 3 (2749 views)
Processing Large files, Regular Expression taking long Can't Post

Hi,

I have to filter out some lines from the log files generated by our webserver.

Each file is around 35+MB (with around 355,000 lines) and there are multiple such files.

I need to eliminate all the lines that has '_BODY' or '_BODY_text' in it with as exception that it should not be in query parameters
Ex:
Eliminate lines like:

172.001.16.87 - - [31/Jan/2006:19:14:53 -0700] "GET /ptrusts/11/dynamic/datapages/HWEligibility_BODY.jhtml HTTP/1.1" 200 7722
172.696.61.87 - - [31/Mar/2006:19:19:07 -0700] "GET /ptrusts/58/dynamic/datapages/Eligibility_BODY_text.jhtml HTTP/1.1" 200 7722

and DO NOT eliminate lines like:

163.50.4.38 - - [01/Feb/2006:08:03:55 -0800] "POST /common/profile/dynamic/PRLogin.jhtml?_DARGS=/common/profile/dynamic/PRLogin_BODY.jhtml.4 HTTP/1.1" 302 12481


To achieve this I am using the regular Expression:
'GET [^\?]*_BODY(_text)?\.j?html'

This seems to be working. But the problem with this is its taking really long time (like appx 15 minutes) to process the 355,000 lines.

(Note: A bit on file process mechanism: I am reading the entire log file into a string and applying the regular expression on that string. )


In general its not a problem to have the script run for 15 minutes.But I have more of these type of filters (around 10) to be applied in which case its going to take 10 * 15 = 150 minutes, which is not quite acceptable.


Is there any way to improve my regular expression or improve the way to process really large file like more than 35MB of size?

-Sudheer


KevinR
Veteran


Jun 7, 2006, 3:47 PM

Post #2 of 3 (2745 views)
Re: [sudheer] Processing Large files, Regular Expression taking long [In reply to] Can't Post

try using perls inplace editor:

http://www.perl.com/pub/a/2004/10/14/file_editing.html
-------------------------------------------------


sudheer
New User

Jun 8, 2006, 10:57 AM

Post #3 of 3 (2735 views)
Re: [KevinR] Processing Large files, Regular Expression taking long [In reply to] Can't Post

Hi Kevin,

Thank you very much for the help.
The inplace mechanism saved a lot. Right now applying each filter is taking around 2 sec as opposed to 15 minutes earlier.

Thanks for the great help.

-Sudheer

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives