CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Need to speed up Perl script

 



Gorgarian
New User

Dec 18, 2012, 1:23 PM

Post #1 of 5 (1633 views)
Need to speed up Perl script Can't Post

Hi guys,

I have written a few programs, not so much in perl. I have recently written a program to compare two hash tables (the files are huge), and could benefit from some advice on how to make my code more efficient. The script is taking days to run.

Data -- key value

TAAAACTGTATTAACCA 29
TAAGCGGTTGTGACAGA 63
AAGATTAGTATAGGACA 30
AATTAAATGCTCTCTCC 81
ATGTATCCAATCAGCTC 176
AGAACCCCTTCAAGGAA 34
AGCTGGCAGGCCCCCAC 32
CTCTGTGCAGCAATAAG 52
CTCCCCAAATGCCCTCC 34
AATTGCACAACCAGGGT 123

#!/usr/local/bin/perl

use strict;
use warnings;

open (F, 'F.dump');
print "File F.dump opened \n";

Step through the input file, taking $blocksize chunks.

# Read in a block of records from the female dataset
# and append to the female hash table

foreach (1..$blocksize){
my $row = <F>;
chomp($row);
my @record=split(' ', $row);
push (@array, @record);
last if (eof(F));
}
%tmphash2=@array;
%fhash=(%fhash, %tmphash2);
$a=($#array+1)/2;
print "$a records from file F.dump read into linear array \n";
print " added to Female hash, now ".keys(%fhash)." records \n";
undef @array;

I think the most obvious problem is that I am reading the lines from a file into an array, creating a hash table from the array, then appending the resulting hash table to that created in the previous iteration.

There is probably a way to read the lines from the file directly into the hash table. I just do not seem to be able to get anything to work. Any suggestions?

Full script attached.

First post.

Smile


(This post was edited by Gorgarian on Dec 18, 2012, 1:26 PM)


wickedxter
User

Dec 18, 2012, 6:14 PM

Post #2 of 5 (1620 views)
Re: [Gorgarian] Need to speed up Perl script [In reply to] Can't Post


Code
#!/usr/local/bin/perl  

use strict;
use warnings;

my %temp_hash;
my @temp_array;

open my $file, '<', 'F.dump';
while (my $line = <$file>){
my($data,$hash_key) = split(/ /, $line);

$temp_hash{$hash_key} = $data if !$temp{$hash_key};

}
close $file;
@temp_array = keys %temp;

$a=($#temp_array+1)/2;
print "$a records from file F.dump read into linear array \n";
print " added to Female hash, now ".keys(%temp_hash)." records \n";
undef @temp_array;


try this and let me know... this will skip all keys that already exists so if you have a key of 28 in the file already and 28 around the end of the file it will skip the data of the second one.

is that all the code of the script?


(This post was edited by wickedxter on Dec 18, 2012, 6:18 PM)


Gorgarian
New User

Dec 18, 2012, 10:35 PM

Post #3 of 5 (1609 views)
Re: [wickedxter] Need to speed up Perl script [In reply to] Can't Post

Thanks wickedxter, that worked a treat. Do not know what I was thinking. Bit rough round the edges with Perl.

# Read in a block of records from the male dataset
# and append to the male hash table

$lncount = 0;
foreach (1..$blocksize){
$line = <M>;
$lncount = $lncount + 1;
# Split the line into k-mer and count
($kmer, $count) = split(/ /, $line);
# Push the record into the hash
$mhash{$kmer}=$count;
# Clean exit on end of file
last if (eof(M));
}
print "$lncount records from file M.dump added to male hash \n";
print " now ".keys(%mhash)." records \n";

I did not put in your if statement, as the keys in the input files are already unique, and it would just add overhead.

I have tried to attach the full script this time.

If there is any other code you see that can be made more efficient, let me know.

Thank you so much for your help.
Cool
Attachments: k-mer_subtracty.cgi (3.07 KB)


BillKSmith
Veteran

Dec 19, 2012, 8:29 AM

Post #4 of 5 (1583 views)
Re: [Gorgarian] Need to speed up Perl script [In reply to] Can't Post

If execution time is still an issue, you should refer to perl's own documentation (perldoc perlperf) for help with tools to identify the problems. It is a waste of time optimizing things that are not heavily used. Our intuition is not a very good guide in identifying areas where work would make a real difference.
Good Luck,
Bill


Gorgarian
New User

Dec 19, 2012, 9:16 AM

Post #5 of 5 (1579 views)
Re: [BillKSmith] Need to speed up Perl script [In reply to] Can't Post

Thanks Bill. I will have a look at Devel::NYTProf which looks like it will do the trick.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives