CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Advanced:
Script is running for long time for copy,grep and find operation on a large no files

 



millan
New User

May 21, 2013, 10:26 AM

Post #1 of 4 (10523 views)
Script is running for long time for copy,grep and find operation on a large no files Can't Post

i have around 50 sub directories in the main directory /app/g1adm/
In each sub directory i have to do below operations.

1> exclude some predefined filenames which are present in an array @predefined.
2> then recusively find .rex,.fmd,.pld,.sh,.sql and other files(which are not of mentioned types)
then
a>copy the .rex files to /temp/Reports folder
run shell script named convert.sh which takes that .rex filename as input parameter (for example convert.sh aa.rex)
b>copy .pld to /temp/price
run shell script named price.sh which takes that .pld filename as input parameter (for example price.sh ab.rex)

c>copy .sh files to /temp/script
d>copy .sql files to /temp/oracle
e>copy other files /temp/others

then

insert these filenames,filetypes to the database.

.rex file ---> the filetype will be "registration report"
.pld file ---> the filetype will be "population report"
.sql file ---> the filetype will be "SQL FILE"
.sh file ---> the filetype will be "SCRIPTS"
other file ---> the filetype will be "OTHERS"


Just for note i have around 50,000 files...so whatever logic i have implemented , it is taking long long time...
Could you please suggest how to do this in parallel processing i,e for each subdirectory the script will create one process to perform the above things.Or any suggestion to improve the script run timings is welcome.

Thank you in advance.


Laurent_R
Veteran / Moderator

May 21, 2013, 10:56 AM

Post #2 of 4 (10520 views)
Re: [millan] Script is running for long time for copy,grep and find operation on a large no files [In reply to] Can't Post

There is no reason why processing 50.000 files should be that long (unless the files themselves are very big when you copy them).

The predefined files should probably be in a hash rather than in an array, as lookup will be much faster.

Please show your code so that we can see if there are some blatant inefficiencies.

Otherwise, you might want to profile your code with one of the various existing modules.


rovf
Veteran

Jun 5, 2013, 5:46 AM

Post #3 of 4 (10435 views)
Re: [millan] Script is running for long time for copy,grep and find operation on a large no files [In reply to] Can't Post

I would not parallelize according to the subdirectories, because the number of files may vary considerabley between subdirectories, but on the files to be processed.

A first shot could go like this:

(1) Collect a list of files to be processed
(2) Go through the list and, for each file, spawn a child process to process the file, but ensure that you don't have more than a certain number of child processes running at a time (i.e. start a new child only when a "processing slot" becomes available).

This strategy can be improved, though:

- Since the number of files is large compared to the number of processes you will likely run in parallel, you could save the overhead of spawning many childs, by passing several files to one child process for processing.

- If you feel that the setup time (to collect the files for processing) already takes an unreasonably long time, you could already start child processes while collecting.

There is yet another possibility, which is outside of Perl: The xargs command line tool should be able to do exactly what you want (at least the Linux version supports a --max-procs switch). In case you are on Windows, you could use the Cygwin version of xargs, or the one from the GnuTools for Windows.


FishMonger
Veteran / Moderator

Jun 5, 2013, 5:58 AM

Post #4 of 4 (10433 views)
Re: [millan] Script is running for long time for copy,grep and find operation on a large no files [In reply to] Can't Post


Quote
Could you please suggest how to do this in parallel processing


Parallel::ForkManager - A simple parallel processing fork manager => http://search.cpan.org/~szabgab/Parallel-ForkManager-1.03/lib/Parallel/ForkManager.pm

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives