CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
desire code for DNA formatting: long description inside

 



bigmac3000lbs
New User

Mar 21, 2011, 11:41 PM

Post #1 of 1 (446 views)
desire code for DNA formatting: long description inside Can't Post

I have data stored in two files

The first is a tab delimited text file with the following format.

EST_name min_coverage max_coverage num_sites num_SNPs SNP_Density
BF01025B1D11.f1 30 100 273 5 0.0183150183150183
BF01025B1F08.f1 30 100 22 8 0.363636363636364
BF01025B2F06.f1 30 100 366 1 0.00273224043715847
BF01025B1F01.f1 30 100 520 19 0.0365384615384615
BF01025B2H10.f1 30 100 156 1 0.00641025641025641
BF01025B1E02.f1 30 100 450 2 0.00444444444444444

The second is a FASTA file with the following format.

> BF01010B1A11.f1 782 1 782 ABI

AACGGACNANNCGGCAACCAGGAGGCCTTCCAAGCTGAACTGGGAGAGTGGATCAAGAAGAAACAAGCCGGCAGAGAGAA

AGAACGATTTGTGTCCGAGAGTGAAACCGACGTCACCCAGGAAGGTACAAACGATCAGCAATCGTTGACGAATAACATAC

CCAAGAACAGAAACAACGCTTTGCGCAAACTGAACAAAGCCTTCGATTCATTAAAGAAAGATCTGGGAGATCCGAAGACC

ATTTATAAGGAGCTGACGTATTTGCAAAATAAACATCAGCAACTTATAAACCAGAAAGTCATAAGTCCCAAAAAACTGTC

CCACAACAGGGACAATATTAATCTGATGCTGAGAAAACTTAACATGGTACTTCTCGGGCAAGCTGGTCTAGCGGACGGCT

CCACACACTTAAAGGAACTGTACTCCTTACAAGAGAAACTAAGCAATTTCAGACAGAAAAATATACCGACGTCGCTGCGC

GAGGAAATCGCTGAGCACTTTCATTGCATCTTCGCTGCGATACCCAGGGATGATTACATAGAACTCTTGAGTAAATTTTA

CAATAAGCCGGTCATAACGTTCAAGAAGAAAAATGATAGGTCCTTCAAAGTCAGTCCGAAGCCAAACCAGAAGACGTTGA

ATCCGATCCAGAACATACAACGCAACGTGAGCGGCGTGAAGGATGACACGAAGGAAAACAACGTGGTCAACGACACGTCC

CAACTGACCGCGGCGACCAAGAACAAGCTGGTGTTCTATCACAGGCGGTTGCTGCGCTCGCG

The ESTs identified in the tab-delimited textfile are a smaller subset of the ESTs found in the FASTA file. I am trying to develop a perl script that will compare the EST name found in the first column of the tab delimited file, against the header section of the FASTA file. If the EST name matches an EST name mentioned in the FASTA file, I want the entire header, along with the actual DNA sequence to be output to a separate FASTA file. Note: the output FASTA file should have the same format as the original FASTA file. I'm really not quite sure how I should go about doing this. Any assistance would be greatly appreciated. Thanks!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives