Jun 13, 2011, 12:10 PM
Post #1 of 1
Dear friends. i have two files one contains headers and sequences(lets say parent file) and another file which contains substrings of some of the sequences in the first file.
Is there any modification i can do to reduce runtime
parent file looks somthing lik this
and so on..
second file looks lik
and so on
now i have to match the substrings back to the parent sequences and print the headers along with the sequences... the parent file consists of some .25million sequences(average sequence length is around 20,file size 1.7MB) and there are some 33 thousand sunstring sequences.. i tried normal matching using hashes where parent sequences and headers be my key and value ,but its taking forever to complete around 2 days. Is there a way by which i can reduce my runtime?
Any help would be appreciated. If u need more details feel free to ask