CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Advanced:
multithreading performance problem


New User

Aug 27, 2008, 12:43 PM

Post #1 of 1 (1621 views)
multithreading performance problem Can't Post

Hello, oh almighty perl gurus!

I'm trying to implement multithreaded processing for the humongous amount of logs that I'm currently processing in 1 process on a 4-CPU server.

What the script does is for each line it checks if the line contains GET request, and if it does - goes through a list of pre-compiled regular expressions, trying to find a matching one. Once the match is found - it uses another regexp, associated with the found match, which is a bit more complex, to extract data from the line. I have split it in two separate matches, because about 30% of all lines will match, and I don't want to run that complex regexp to extract data for all the lines I know won't match. The goal is to count how many lines matched for every specific regexp, and the end result is built as a hash, having data, extracted from the line with second regexp, used as hash keys, and the value is the number of matches.

Anyway, currently all this is done in a single process, which parses approx. 30000 lines per second. The CPU usage for this process is 100%, so the bottleneck is in the parsing part.

I have changed the script to use threads + threads::shared + Thread::Queue. I read data from logs like this:

until( $no_more_data ) { 
my @buffer;
foreach( (1..$buffer_size) ) {
if( my $line = <> ) {
push( @buffer, $line );
} else {
$no_more_data = 1;
$q_in->enqueue( \@buffer );
foreach( (1..$cpu_count) ) {
$q_in->enqueue( undef );
$q_in->enqueue( \@buffer ) unless $no_more_data;

Then, I create $cpu_count threads, which does something like this:

sub parser { 
my $counters = {};
while( my $buffer = $q_in->dequeue() ) {
foreach my $line ( @{ $buffer } ) {
# do its thing
return $counters;

Everything works fine, HOWEVER! It's all so damn slow! It's only 10% faster than single-process script, consumes about 2-3 times more memory and about as much times more CPU.

I've also tried abandoning the Thread:Queue and just use threads::shared with lock/cond_wait/cond_signal combination, without much success.

I've tried to play with $cpu_count and $buf_size, and found that after $buf_size > 1000 doesn't make much difference, and $cpu_count > 2 actually makes things a lot worse.

Any ideas why in the world it's so slow? I did some research and couldn't find a lot of info, other than the way I do it pretty much the way it should be done, unless I'm missing something...

Hope anybody can enlighten me...



Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives