CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Extracting DNA sequences from GenBank files using Perl

 



akreibich07
New User

Jun 23, 2009, 10:34 PM

Post #1 of 2 (564 views)
Extracting DNA sequences from GenBank files using Perl Can't Post

Hi all,

Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here...

Nucleotide

This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all the genes and their positions in the species corresponding to NC_001666 (corn). Here is a sample of how the text file is formatted...

rbcL (56874..58304)
atpB -(54618..56114)
atpE -(54208..54621)
trnM (54020..54092)
trnV -(53158..53834)

For example, if in my command prompt I give input of the program name, the species number that I want, and the specific gene from that species whose DNA sequence I want:

perl nucleotide_bases.pl NC_001666 trnM

The program would go into NC_001666.txt, find trnM, see that it has a range from 54020 to 54092 and is on the positive strand(no negative sign). The program then goes into NC_001666.gb, goes to the long list of DNA bases at the bottom and starts at position 54020 and returns all base letters through 54092 (inclusively). So for this specific trnM, the output would be:

gcctacttaactcagtggttagagtattgctttcatacggcgggagtcattggttcaaatccaatagtaggta

If a gene has a negative next to the position range (meaning it's on the negative strand of DNA), the output should be reversed, starting from the higher position, going to the lower. Also, when a negative is there, in that output, all A's should be switched to T's, and all G's to C's and vice versa.

Also, if a gene appears more than once in a text file, give an error message that it appears more than once, and end the program.

If I could get a Perl script to return this information for any species (NC_number) I want, and any gene from that species that I want, it would be a great help in the research I am conducting. Thank you all for your time, and any help on how to write this script would be appreciated.

-akreibich07


KevinR
Veteran


Jun 23, 2009, 11:26 PM

Post #2 of 2 (563 views)
Re: [akreibich07] Extracting DNA sequences from GenBank files using Perl [In reply to] Can't Post

ahh... I see, you're going to shop your question around the various perl forums. Best of luck to you.
-------------------------------------------------

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives