CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner:
Parallel Processing in perl



Jan 13, 2014, 2:29 AM

Post #1 of 6 (2788 views)
Parallel Processing in perl Can't Post

My Input File has 236135 lines , each line's ID has to be searched in DB .
This actually is taking long time ,
And I have to create 10 files from this file and run them parallelly

Ex : X has 200000 line
i have created 10 files with 20000 each and ran the file.

Can i have parallel processing in perl , So that only a sine file X is checked and it creates parallel proccess and accomplishes the task

So That I can just have a single file and run the code only once

Below is example of my input file and my script

File Data : I ahve more than 5 LAKH ROWS

Script :

while (my $line=<IP>) {
#send the esid to query for retriving customerid
$esid = (split /,/, $line)[0];
my $output = $session1->iterate_on_array_for_cursor('get_double_payment_data',0,\&iterator_ppcl,$esid);

Thus I want to run a single script for teh entrire file (5 lakh Rows) And want the code to be effective as far as time is concerned, Now it is taking around 24 hours for full file and After Fragmenting the file to 10 files , its taking around 3 hours for each file.

Can A Single Input File(However Huge) as Input , Single Run , With some parallel proccesing possible?



Jan 13, 2014, 2:53 AM

Post #2 of 6 (2787 views)
Re: [Tejas] Parallel Processing in perl [In reply to] Can't Post

if u look at the data , last column values will be from 3,4,5,6 or 7 (these are country_id's for US,CANADA,GREATBRITAN,DENMARK AND SPAIN)
So Can we create a process using fork for each ID, so that five processes run parallelly at a time and accomplish the task

Veteran / Moderator

Jan 13, 2014, 2:13 PM

Post #3 of 6 (2783 views)
Re: [Tejas] Parallel Processing in perl [In reply to] Can't Post

Hmm, not sure to understand everything.

Processing 200 k-lines in Perl is really fast, I am not sure that you really need parallel processing for that.

If it is slow (how slow, BTW?), then, most probably this line:

my $output = $session1->iterate_on_array_for_cursor('get_double_payment_data',0,\&iterator_ppcl,$esid);

is slow, but we don't really know what it is really doing. If, as the name implies, it iterates on an array for each input line, then that may be what is slow. There may be some better way of doing that, but we don't have information to help you.

I would really look at that before investigating parallel processing. Now if you really want to use parallel processing, there are a number of ways, including:
- Using the shell (assuming you are on some form of Unix) to launch several background Perl processes in parallel;
- Forking several processes within your Perl program
- Using light-weight processes or threads.

For splitting the data, I would not recommend using the last column if such column represents a country. It is quite likely that some countries have many more records than some others so that you will end up with poor splitting, with some processes having a lot of work and others completing much earlier, so that in the end, you might have only one or two processes running while the others are completed and, in fine, don't gain much from parralel processing. I would rather use something like the line number in the file for splitting the data.

Assuming you want to run 10 processes, as an example, then there are two basic ways of doing it: having a preliminary process splitting you file into ten temporary files, and then having ten processes, each processing one file (or splitting the data into more files and having each process processing several files); or having your processes all reading the same file and processing only the lines for that process (for example, process 0 could process only lines whose line number ends with 0, etc.).

Without knowing what the code line I highlighted above actually does, I can't give any more advice than those general guidelines.

I hope this helps.


Jan 13, 2014, 2:52 PM

Post #4 of 6 (2779 views)
Re: [Tejas] Parallel Processing in perl [In reply to] Can't Post

Before trying parallel processing, it may be beneficial for you to first profile your current script using NYTProf, as it can give you a good performance overview of your script, including bottleneck areas which you may be able to refactor for performance improvement.


Jan 27, 2014, 2:01 AM

Post #5 of 6 (2539 views)
Re: [Laurent_R] Parallel Processing in perl [In reply to] Can't Post


Here is the code
And I would like to split the code into different processes depending on the best possible way .So that things will be completed quickly

use Date::Format;
use Getopt::Std;
use Company::Admin::Finance::Utils qw (amz_info amz_fatal get_month_start_end_dates encom_date_converter);
use Company::Admin::Finance::DBSession;
use Company::Admin::Finance::LoggingDBSession;
use Company::Admin::Finance::RcslUtilities;
use Company::Admin::Finance::RcslMap;

our $opt_x;
my $country_code = $opt_x;

#Create a DB Session
my $session;
$session = Company::Admin::Finance::DBSession::create_session("$country_code-rcsl");

#Takes the file name as arguement
my $BADDEBT = $ARGV[0];
my $pwd = `pwd `;
chomp ($pwd);

#List of Output Files

my $Tot_Missing_amnt = "$pwd/Amount_Summary-$BADDEBT";

#open input file and output file for reading and writing
open BADDEBT, "< $BADDEBT" or die "Could not open output file: $!\n";
open (Tot_Missing_amnt, ">$Tot_Missing_amnt") or die "unable to open esid output file: $!";

my %backfill_dispatch;
$backfill_dispatch{1} = create_func_backfill( "$pwd/US-BACKFILL_FILE");
$backfill_dispatch{6} = create_func_backfill( "$pwd/JP-BACKFILL_FILE");
$backfill_dispatch{7} = create_func_backfill( "$pwd/CA-BACKFILL_FILE");
$backfill_dispatch{526970}= create_func_backfill( "$pwd/BR-BACKFILL_FILE");
$backfill_dispatch{771770}= create_func_backfill( "$pwd/MX-BACKFILL_FILE");
$backfill_dispatch{111172} = create_func_backfill( "$pwd/AU-BACKFILL_FILE");
my %missing_esid_dispatch;
$missing_esid_dispatch{1} = create_func_missing_esid( "$pwd/US-MISSING_ESID");
$missing_esid_dispatch{6} = create_func_missing_esid( "$pwd/JP-MISSING_ESID");
$missing_esid_dispatch{7} = create_func_missing_esid( "$pwd/CA-MISSING_ESID");
$missing_esid_dispatch{526970}= create_func_missing_esid( "$pwd/BR-MISSING_ESID");
$missing_esid_dispatch{771770}= create_func_missing_esid( "$pwd/MX-MISSING_ESID");
$missing_esid_dispatch{111172} = create_func_missing_esid( "$pwd/AU-MISSING_ESID");

my %zero_log_dispatch;
$zero_log_dispatch{1} = create_func_zero_log( "$pwd/US-ZEROLOG");
$zero_log_dispatch{6} = create_func_zero_log( "$pwd/JP-ZEROLOG");
$zero_log_dispatch{7} = create_func_zero_log( "$pwd/CA-ZEROLOG");
$zero_log_dispatch{526970}= create_func_zero_log( "$pwd/BR-ZEROLOG");
$zero_log_dispatch{771770}= create_func_zero_log( "$pwd/MX-ZEROLOG");
$zero_log_dispatch{111172} = create_func_zero_log( "$pwd/AU-ZEROLOG");
#Create a cusor with a query to get the customer_id for each entry source id in the input file
my %cursor2 = ('get_digital_bad_debt_data' => qq{
SELECT distinct customer_id,company_code,to_char(ENTRY_DATE,'DD-MON-YYYY') from prepaid_customer_entries
my ($custid, $date, $esid, $amount, $ppcl_id, $currency, $miss_amount,$tot_amount,$company_code);
$miss_amount = 0;
$tot_amount = 0;
#Order ID,Order Date,GL,DPIID,DPID,Txn Amount,Txn Currency,Condition,Marketplace,Txn Type,Status,Revenue,Revenue Currency,COGS,COGS Currency,Vendor Type,Revoked?

our %Total_Amnt_Hash;
#This while loop takes each line of the input file and we store it in an array and access each and every element and store them in variables_custid_avail) =s

while (my $line = <BADDEBT>)
my ($date, $esid, $amount, $currency, $ppcl_id) = (split /,/,$line)[1, 4, 5, 6, 8];
if ($amount eq 0)
($custid,$company_code,$date) = $session->array_for_cursor('get_digital_bad_debt_data', 0, $esid);
if ($custid)
# the braces around the filehandle are required in this case
$Total_Amnt_Hash{Backfilled_ESID}{$ppcl_id} += $amount;
$Total_Amnt_Hash{Missing_ESID}{$ppcl_id} += $amount;
while (($comments, $ppcl_ids) = each %Total_Amnt_Hash)
while (($ppcl_id, $amount) = each %$ppcl_ids)
print Tot_Missing_amnt "Total Amount for the $comments 's for PPCL_ID $ppcl_id is $amount \n ";

sub create_func_backfill {
my $file = shift;
open my $FH, ">", $file or die "cound not open $file $!";
return sub {
my $to_be_printed = shift;
print $FH $to_be_printed, "\n";


sub create_func_missing_esid {
my $file = shift;
open my $FH, ">", $file or die "cound not open $file $!";
return sub {
my $to_be_printed = shift;
print $FH $to_be_printed, "\n";



sub create_func_zero_log {
my $file = shift;
open my $FH, ">", $file or die "cound not open $file $!";
return sub {
my $to_be_printed = shift;
print $FH $to_be_printed, "\n";



#Close all the files
close (BADDEBT);

Format of the Input File

LMDI-3815309-PVRKVG,26-Nov-13,318,FYYMRT695443,ATRFYY999443,500,IPT,2,6,Filler,CLOSED,500,IPT,0,NO VP,Puvv,N,VP
LMDI-5240061-PVRKVG,27-Nov-13,318,FYYMRT306443,ATRFYY371443,200,IPT,2,6,Filler,CLOSED,200,IPT,0,NO VP,Puvv,N,VP
LMDI-6655822-PVRKVG,26-Nov-13,318,FYYMRT096293,ATRFYY506293,2000,IPT,2,6,Filler,CLOSED,2000,IPT,0,NO VP,Puvv,N,VP
LMDI-2685425-PVRKVG,26-Nov-13,318,FYYMRT727293,ATRFYY135293,400,IPT,2,6,Filler,CLOSED,400,IPT,0,NO VP,Puvv,N,VP

(This post was edited by FishMonger on Jan 27, 2014, 9:06 AM)


Jan 27, 2014, 2:07 AM

Post #6 of 6 (2536 views)
Re: [Laurent_R] Parallel Processing in perl [In reply to] Can't Post

my $output = $session1->iterate_on_array_for_cursor('get_double_payment_data',0,\&iterator_ppcl,$esid); is not accessible and cant be changed too.

So i have to find out parallel processing way to come out of this issue

Iam really a novie as far as parallel processing is concerned .


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives