CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Thread or Fork?

 



Sam Kennedy
Novice

Aug 29, 2012, 1:51 PM

Post #1 of 21 (5827 views)
Thread or Fork? Can't Post

In my code I have an array of scores. I want to create a few separate 'processes' (whether threads or otherwise) which will do some calculations, and update a separate element in the scores array.

I've already tried using threads but it hasn't worked, here is the code which I used:


Code
sub score_images { 
for(my $i = 1; $i <= $population; $i++){
my $flag = 1;
for(my $l = 0; $l < @touched; $l++){
if($i == $touched[$l]) { $flag = 0; }
}
if($flag == 0 || $firstrun == 1){
$score[$i] = 0;
${"thr".$i} = new Thread \&pixel_comparison, $i;
}
}
while($runningthreads > 0){ }
for(my $i = 1; $i <= $population; $i++){
if($score[$i] < $best){
$best = $score[$i];
$savedIndex = $i;
}
}
@touched = ();
}

sub pixel_comparison {
$runningthreads++;
foreach my $x ( 0 .. $width ) {
foreach my $y ( 0 .. $height ) {
my ($index1) = $timage->getPixel($x,$y);
my ($index2) = $im[$_[0]]->getPixel($x,$y);
my ($r1,$g1,$b1) = $timage->rgb($index1);
my ($r2,$g2,$b2) = $im[$_[0]]->rgb($index2);
$score[$_[0]] += ($r1 - $r2)**2;
$score[$_[0]] += ($g1 - $g2)**2;
$score[$_[0]] += ($b1 - $b2)**2;
}
}
$runningthreads--;
}


I need to have multiple instances of pixel_comparison() running, each will only change a single value in the score array.

Would forking be better for this?

Thank You,
-Sam


FishMonger
Veteran / Moderator

Aug 29, 2012, 2:29 PM

Post #2 of 21 (5825 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post


Quote
it hasn't worked

That's a very poor problem statement.

In what way does it not work?

What does it do that you didn't expect and how does that differ from what you expected?

What thread module(s) are you using?


Quote

Code
${"thr".$i} = new Thread \&pixel_comparison, $i;


Ouch! Why are you using a symbolic reference?

Doing that tells me that you're not using the strict pragma, which is a mistake. Instead of the symbolic reference, you should be using a hash or an array.

Also, it's best to not use the indirect object call. That statement should more like this:

Code
$thread[$i] = Thread->new(\&pixel_comparison, $i);



(This post was edited by FishMonger on Aug 29, 2012, 2:30 PM)


Sam Kennedy
Novice

Aug 29, 2012, 2:41 PM

Post #3 of 21 (5821 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

Sorry! By not working, I mean the score array doesn't get touched, then windows pops up the dreaded 'ActivePerl has stopped working' error message.

I'm also just using the Thread module.

I was originally using an array, but I thought that was the cause of the problem so tried using the symbolic reference instead, with the same result.

(I've just changed the code to use an array, and changed the indirect object call, but the same problems as above are occuring)


(This post was edited by Sam Kennedy on Aug 29, 2012, 2:42 PM)


FishMonger
Veteran / Moderator

Aug 29, 2012, 5:30 PM

Post #4 of 21 (5815 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

The Thread module is deprecated and its author does not recommend using it, instead the recommendation is to use the threads module.
http://search.cpan.org/~jdhedden/threads-1.86/lib/threads.pm

If you want all threads to be able to update the same var, then you also need to use the threads::shared module.
http://search.cpan.org/~jdhedden/threads-shared-1.40/lib/threads/shared.pm

My suggestion is to write a short test script using these modules and the other suggestions we've given (i.e., use the strict and warnings pragmas and not to use symbolic references) to work out the issue with each thread updating the same var and when/if you have trouble post that test script and a new question regarding that new problem. Once those issues are worked out, we can work on integrating those fixes in your current script.


Sam Kennedy
Novice

Aug 30, 2012, 7:18 AM

Post #5 of 21 (5787 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

Thank you very much! :D

I've been playing around with the Threads and Threads shared module and figured out how it works, I should have no problem implementing this in my code now :)

EDIT: It turns out I will need to share an object, I've shared the variable and the array which are used to create the object, but my code keeps crashing.

This is how I declared my variables etc.:

Code
my $timage :shared; 
my $width :shared;
my $height :shared;
my @im :shared;
my @score :shared;


This is the first object being created:

Code
$timage = GD::Image->newFromPng("ml.png"); 
($width, $height) = $timage->getBounds;


And the final objects being created:

Code
for(my $i = 1; $i <= $population; $i++){ 
$im[$i] = new GD::Image($width,$height,1);
...
...


They are accessed in the thread like this:


Code
my ($index1) = $timage->getPixel($x,$y); #My code crashes on this line 
my ($index2) = $im[$_[0]]->getPixel($x,$y);


Is there a problem in the way I'm sharing the object, the way I'm accessing it, or both?

Thank You
-Sam





-Sam


(This post was edited by Sam Kennedy on Aug 30, 2012, 8:00 AM)


Sam Kennedy
Novice

Aug 30, 2012, 10:17 AM

Post #6 of 21 (5784 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

I've done a bit more digging, the threads execute fine, but as soon as I call join() that's when I get the crash.

It's weird because I placed some test code in a different part of my script and it worked, but as soon as I put it in the for loop it crashes.

Here is the full script: http://pastebin.com/XN3Xr6dY

The threaded function is on line 160, the threads are created on line 139, and terminated on line 147, where the crash occurs. (Segmentation fault)


FishMonger
Veteran / Moderator

Aug 30, 2012, 11:48 AM

Post #7 of 21 (5782 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

Before I'm willing to help troubleshoot the script, you need to uncomment the strict pragma line and get rid of all of the symbolic references.


Sam Kennedy
Novice

Aug 30, 2012, 12:41 PM

Post #8 of 21 (5779 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

To get rid of the symbolic references I need another way to implement 2d arrays, I'll try giving it a go...


FishMonger
Veteran / Moderator

Aug 30, 2012, 12:54 PM

Post #9 of 21 (5777 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

You should also take a look at your score_images() sub. It's missing a closing brace, which will cause your script to crash.


FishMonger
Veteran / Moderator

Aug 30, 2012, 1:05 PM

Post #10 of 21 (5775 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post


In Reply To
To get rid of the symbolic references I need another way to implement 2d arrays, I'll try giving it a go...


If you're referring to sections such as this:

Code
$poly->addPt(${"xchrom".$i}[$z], ${"ychrom".$i}[$z]); 
$poly->addPt(${"xchrom".$i}[$z+1], ${"ychrom".$i}[$z+1]);
$poly->addPt(${"xchrom".$i}[$z+2], ${"ychrom".$i}[$z+2]);
$poly->addPt(${"xchrom".$i}[$z+3], ${"ychrom".$i}[$z+3]);

Those are not 2d arrays. They are plain single level arrays.

If you want a 2d array, use this:

Code
$poly->addPt($xchrom[$i][$z], $ychrom[$i][$z]);



Sam Kennedy
Novice

Aug 30, 2012, 1:17 PM

Post #11 of 21 (5774 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

I've managed to recreate the problem with a 40 line test script:


Code
#!/usr/bin/perl 
use strict;
use warnings;
use threads;
use threads::shared;
use GD;

my @score :shared;
my @im;
my @thr;

$im[0] = GD::Image->newFromPng("img0.png");
$im[1] = GD::Image->newFromPng("img1.png");
$im[2] = GD::Image->newFromPng("img2.png");

my ($width, $height) = $im[0]->getBounds;

for(my $i = 1; $i <= 2; $i++){
$thr[$i] = threads->create(\&threaded_sub, $i, $im[0], $im[$i]);
}

for(my $i = 1; $i <= 2; $i++){
$thr[$i] -> join();
print $score[$i]."\n";
}

sub threaded_sub {
$score[$_[0]] = 0;
foreach my $x ( 0 .. $width ) {
foreach my $y ( 0 .. $height ) {
my ($index1) = $_[2]->getPixel($x,$y);
my ($index2) = $_[1]->getPixel($x,$y);
my ($r1,$g1,$b1) = $_[2]->rgb($index1);
my ($r2,$g2,$b2) = $_[1]->rgb($index2);
$score[$_[0]] += ($r1 - $r2)**2;
$score[$_[0]] += ($g1 - $g2)**2;
$score[$_[0]] += ($b1 - $b2)**2;
}
}
}


If you need the 3 png images to test the script you can use these: http://www.maxsurl.com/link.php?ref=yoy0XGtTE4

Thank You :)
-Sam


FishMonger
Veteran / Moderator

Aug 30, 2012, 5:10 PM

Post #12 of 21 (5761 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

Your earlier post indicated that the script was failing when you call join(). When I run the script through the debugger, it fails when creating the first thread.

Quote
DB<1> Thread 1 terminated abnormally: Undefined subroutine readline::CLONE at sam.pl


I have not yet tried to troubleshoot that failure, but will look into it when I can.


Sam Kennedy
Novice

Aug 30, 2012, 5:21 PM

Post #13 of 21 (5758 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

I managed to sort of get it running, however I have to avoid using join() or detach(), and within about 30 seconds my memory becomes full and the program errors and quits. I've ran it in linux, which gave me a "pthread_create returned 11" error, which confirms that there are too many threads being created.

Thank you for taking the time to help me out, this code is starting to drive me insane :)

-Sam


FishMonger
Veteran / Moderator

Aug 30, 2012, 5:56 PM

Post #14 of 21 (5756 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

Good to hear that you "sort of" got it working.

Post back or start a new thread if you need more help.


(This post was edited by FishMonger on Aug 30, 2012, 5:57 PM)


Sam Kennedy
Novice

Aug 30, 2012, 6:15 PM

Post #15 of 21 (5753 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

I'm not sure if multi threading is really giving me any performance gain? I timed multiple runs and the threaded version was on average 5x slower, however because of the memory leak it's obvious it isn't working properly so it may not have been a representative sample of what is really possible.

How difficult would a thread pool be to implement? I only need 5 - 10 threads running at a time.


Laurent_R
Veteran / Moderator

Aug 30, 2012, 11:45 PM

Post #16 of 21 (5747 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

Well, why do you want to use multi-process or multi-thread in the first place?

If it is for performance and you have only one CPU, it is probably a bad idea.


FishMonger
Veteran / Moderator

Aug 31, 2012, 6:36 AM

Post #17 of 21 (5739 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

Did you do any kind of profiling of your script before deciding to add the threading? I'll assume you didn't.

You should take a step back and profile your version prior to adding the threading to see where it's spending its time and then work on optimizing those sections.

Devel::NYTProf - Powerful fast feature-rich perl source code profiler
http://search.cpan.org/~timb/Devel-NYTProf-4.08/lib/Devel/NYTProf.pm


Sam Kennedy
Novice

Aug 31, 2012, 6:45 AM

Post #18 of 21 (5737 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

I already profiled my code, the scoring subroutine was the slowest so I decided to thread that, so instead of it waiting to score image 1, then image 2 ... 10, it could have 10 threads running the subroutine.

Someone who attempted something very similar to me suggested I store the images in a 2d array, I did that and got a 35% improvement in speed, I'm attempting a complete overhaul of the code to see if I can push that further.

I was thinking maybe the GD library wasn't thread safe, but from what I've read there was a feature which wasn't (that I didn't use), but it's already been patched.


FishMonger
Veteran / Moderator

Aug 31, 2012, 7:57 AM

Post #19 of 21 (5733 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

While you work on overhauling the code, you may want to take these things into consideration.

1) You're calling your main() sub prior to declaring/initializing the file scoped vars that the sub uses and you have way too many of those vars.

2) The use of a main() sub is unneeded and only servers to add unnecessary indentation.

3) Every sub is accessing file scoped vars directly, which is a bad coding practice. You should a) declare your vars in the smallest scope that they require and b) pass globally scoped vars to subs rather than accessing them directly.

4) You have way too much duplication of code. Instead of having individual subs that update a specific var, you should make the subs more generic and pass in vars by reference. That alone would mean you could drop most of those subs.

5) The C style for loop is very verbose and looks cluttered. It's much cleaner and easier to use Perl's style of the for loop.

6) I suspect that the multiple use of nested for loops could be the source of part of the slowdown. I have not analyzed them enough, but I'm pretty sure that those blocks can be reworked to reduce the amount of looping.

7) Your save_genes() sub looks pretty inefficient. The use of the join function could help in cleaning up or getting rid of those for loops. It might be better and more efficient to build up a data structure and then dump that to the file in one print statement.

With more analysis, I could probably come up with more things to consider, but that should be enough to get started.


(This post was edited by FishMonger on Aug 31, 2012, 7:58 AM)


Sam Kennedy
Novice

Aug 31, 2012, 4:40 PM

Post #20 of 21 (5714 views)
Re: [FishMonger] Thread or Fork? [In reply to] Can't Post

Okay I managed to speed up the score subroutine... at the cost of dramatically slowing down another subroutine, I'll work on cleaning up my fastest version of the code.


Sam Kennedy
Novice

Aug 31, 2012, 6:33 PM

Post #21 of 21 (5708 views)
Re: [Sam Kennedy] Thread or Fork? [In reply to] Can't Post

Here is my new code, I used strict, warnings and ran it through Perl::Critic, hopefully it's clearer now but let me know if I'm still using bad practice:

http://pastebin.com/LWmRFMhr

It's the nested foreach loops on lines 145/146 which I think could be sped up using threads.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives