Mar 21, 2011, 11:41 PM
Post #1 of 1
I have data stored in two files
desire code for DNA formatting: long description inside
The first is a tab delimited text file with the following format.
EST_name min_coverage max_coverage num_sites num_SNPs SNP_Density
BF01025B1D11.f1 30 100 273 5 0.0183150183150183
BF01025B1F08.f1 30 100 22 8 0.363636363636364
BF01025B2F06.f1 30 100 366 1 0.00273224043715847
BF01025B1F01.f1 30 100 520 19 0.0365384615384615
BF01025B2H10.f1 30 100 156 1 0.00641025641025641
BF01025B1E02.f1 30 100 450 2 0.00444444444444444
The second is a FASTA file with the following format.
> BF01010B1A11.f1 782 1 782 ABI
The ESTs identified in the tab-delimited textfile are a smaller subset of the ESTs found in the FASTA file. I am trying to develop a perl script that will compare the EST name found in the first column of the tab delimited file, against the header section of the FASTA file. If the EST name matches an EST name mentioned in the FASTA file, I want the entire header, along with the actual DNA sequence to be output to a separate FASTA file. Note: the output FASTA file should have the same format as the original FASTA file. I'm really not quite sure how I should go about doing this. Any assistance would be greatly appreciated. Thanks!