Oct 28, 2012, 11:19 AM
Post #32 of 34
Re: [Chris Charley] DNA Sequence Count Perl Program HELP!
[In reply to]
I had never done any work or anything on DNA sequences before, and I clearly said from the beginning that my knowledge is close to 0 on these matters.
Among other things, I did not know about headers and I did not notice that there were such headers in you file. I think Kevone said that his or her file contained only a stream of ACGT letters, so I assume it had not headers.
Of course it would not be complicated to remove the headers just as I have removed carriage return and new line characters, but I agree (and already wrote) that, in general, it is better to use existing modules if they are fit for the task at hand. In this particular case, the initial requirement was so simple that it could be done in a dozen lines of code, so I suggested some code without modules (I had no idea about what existing modules can or cannot do, as I said, this is really not my field).
As for the sliding window, I thought of at least 3 different methods to implement it, the one I used, something very similar to the one you used and, since the window size is a multiple of the slide factor, a third one where it would cut the file in segments of 200 bases and store the numbers of C and G for each segment in an array (or 2 arrays, an array of hashes, or whatever), and then do all the calculations on this array. This last method would probably be the best in terms of performance, but performance is not an issue with a mere 5 million nucleobases and the program might be a bit more complex. I chose the approach I used simply because it seemed to be the one that required the minimal amount of changes to the code I had already written.