Jul 18, 2014, 10:09 AM
Post #2 of 3
Am I correct to understand that what you call nodes are just files that you want to read through?
Re: [preston3271] How to list into separate files and Fork() function (Updated)
[In reply to]
It also seems that the splitting into 25 files is aimed only at load balancing, the data is not dispatched on ther basis of its content.
If you have a single process reading all the files in order to split them into other files before handling them to child process, you might lose the benefit of parallel processing.
A much simpler technique your be to make an array of your file names (effectively a queue of file names) and to assign one file to each of your 25 children, which will read the data and process it. Once one process has completed, just assign a new file to it, and so forth until you run out of files.
In most cases, this should be more efficient, because you are reading your data only once and immediately in parallel processes. If the files have significantly different sizes, then start by sorting your file array by size, and assign the large files first to the children, this will ensure close-to-optimal load balancing in most cases (the only pathological case where this might not be optimal is if there are huge sizes difference between files, with a few extremely large and others extremely small).
As for dispatchning the work between the children, it depends how you implement the launching of parallel processes or threads. Either the parent process can also be the master, monitor the children and hand out work to any child running out of work, or the children can take a filename and remove it from the queue when irt completes some work and becomes idle.
There are many many implementation details missing in the above, but you are not supplying enough information on what you are really trying to do (which is more important than how you contemplate to do it, as you can see I am suggesting a slightly modified way of doing it).