CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Intermediate:
Extracting DNA sequences from GenBank files using Perl


New User

Jun 23, 2009, 10:34 PM

Post #1 of 2 (761 views)
Extracting DNA sequences from GenBank files using Perl Can't Post

Hi all,

Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here...


This is saved on my computer as I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all the genes and their positions in the species corresponding to NC_001666 (corn). Here is a sample of how the text file is formatted...

rbcL (56874..58304)
atpB -(54618..56114)
atpE -(54208..54621)
trnM (54020..54092)
trnV -(53158..53834)

For example, if in my command prompt I give input of the program name, the species number that I want, and the specific gene from that species whose DNA sequence I want:

perl NC_001666 trnM

The program would go into NC_001666.txt, find trnM, see that it has a range from 54020 to 54092 and is on the positive strand(no negative sign). The program then goes into, goes to the long list of DNA bases at the bottom and starts at position 54020 and returns all base letters through 54092 (inclusively). So for this specific trnM, the output would be:


If a gene has a negative next to the position range (meaning it's on the negative strand of DNA), the output should be reversed, starting from the higher position, going to the lower. Also, when a negative is there, in that output, all A's should be switched to T's, and all G's to C's and vice versa.

Also, if a gene appears more than once in a text file, give an error message that it appears more than once, and end the program.

If I could get a Perl script to return this information for any species (NC_number) I want, and any gene from that species that I want, it would be a great help in the research I am conducting. Thank you all for your time, and any help on how to write this script would be appreciated.



Jun 23, 2009, 11:26 PM

Post #2 of 2 (760 views)
Re: [akreibich07] Extracting DNA sequences from GenBank files using Perl [In reply to] Can't Post

ahh... I see, you're going to shop your question around the various perl forums. Best of luck to you.


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives