
Chivalri
Novice
Aug 11, 2008, 1:07 PM
Post #10 of 13
(727 views)
|
|
Re: [shawnhcorey] Help speeding up this code?
[In reply to]
|
Can't Post
|
|
True enough, I did add it since I saw it in your example and wasn't sure why you threw it in there Unfortunately, the data comes in pretty much any format. I am matching data like "LOGMNRG_ICOL$" and "CHANGE_DETECT_296991" against log file lines like "select pitagname, minvalue, maxvalue, LASTMODIFYDATE, COMPMAXSECS from standardpitag where eq...". The script basically runs against large DB's to see what users are doing, and parses that into a series of reports. The log files can grow to be a few hundred megs after just one day, and we will generally use a week or more of data to generate these reports. The only other thing I could think about is putting this amount of data in an array seems to be much slower then using a hash, so I switched to using a hash. This brought the test data set down to about 3 minutes and 40 secs (baseline is 25-30 secs using original code). Here is the code updated with hashes instead of arrays and precompiled:
my %htable_re; #reading in small file while($cur_line=<size_log>) { my @command=split(/;;;+/,$cur_line); $htable_re{$command[0]}=qr/\b($command[0])\b/i #parse remainder of line... } #pattern match against big file while(<fh_log>) { foreach $key (keys %htable_re) { if(/$htable_re{$key}/ ) #If this query contains reference to this table { $table_stats{$key}++; #count up how many times this table is referenced. $table_refs++; } } }
|