Jun 5, 2013, 5:46 AM
Post #3 of 4
I would not parallelize according to the subdirectories, because the number of files may vary considerabley between subdirectories, but on the files to be processed.
Re: [millan] Script is running for long time for copy,grep and find operation on a large no files
[In reply to]
A first shot could go like this:
(1) Collect a list of files to be processed
(2) Go through the list and, for each file, spawn a child process to process the file, but ensure that you don't have more than a certain number of child processes running at a time (i.e. start a new child only when a "processing slot" becomes available).
This strategy can be improved, though:
- Since the number of files is large compared to the number of processes you will likely run in parallel, you could save the overhead of spawning many childs, by passing several files to one child process for processing.
- If you feel that the setup time (to collect the files for processing) already takes an unreasonably long time, you could already start child processes while collecting.
There is yet another possibility, which is outside of Perl: The xargs command line tool should be able to do exactly what you want (at least the Linux version supports a --max-procs switch). In case you are on Windows, you could use the Cygwin version of xargs, or the one from the GnuTools for Windows.