CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Search a file for a "sentence" and print this "sentence" to a file

 



yesitsjess
New User

Nov 3, 2011, 5:22 PM

Post #1 of 3 (647 views)
Search a file for a "sentence" and print this "sentence" to a file Can't Post

Hello! Basically, I need to search a DNA file for a start codon (ATG) and then return the ATG and all other letters until it finds a TAA, TAG or TGA. I've had probably about 5 gos, each COMPLETELY different to the last, but I'm getting really stressy over it now.

I would prefer to return this "sentence" as a string, since I have a specific printing subroutine, but I'm struggling enough as it is. Any help is MASSIVELY appreciated.


Code
#define mRNA file, create array of nucleotides 
@filedata = get_data('htra1rna.fasta');
$rna = extract_data(@filedata);

#print only characters between the first ATG and either TAA, TAG or TGA
$rna = @rna;
@trans=grep(/"ATG.*TAA|TAG|TGA"/, @rna);
if (@trans) {
foreach $match (@trans) {
print OUTFILE $match."\n";
}
}


EDIT: I have also tried /"ATG.{n,}(TAA|TAG|TGA)"/


(This post was edited by yesitsjess on Nov 3, 2011, 5:25 PM)


Chris Charley
User

Nov 4, 2011, 8:47 AM

Post #2 of 3 (559 views)
Re: [yesitsjess] Search a file for a "sentence" and print this "sentence" to a file [In reply to] Can't Post

I'm not clear about whether you want to print the whole string (if it matches) or only the matching part. If it is just the matching part, then this code may be a solution.

Code
for my $string (@rna) { 
if ($string =~ /(ATG.*?(?:TAA|TAG|TGA))/) {
print OUTFILE $1, "\n";
}
}



rovf
Veteran

Nov 4, 2011, 9:11 AM

Post #3 of 3 (553 views)
Re: [yesitsjess] Search a file for a "sentence" and print this "sentence" to a file [In reply to] Can't Post


Quote
grep(/"ATG.*TAA|TAG|TGA"/, @rna)


There are two problems with your code:

First, you are not searching for a text starting with ATG, but your regexp matches if one of the following is true:

- The string contains a substring starting with a quote, followed by ATG (note that a quote is just like any other "normal" character in a regexp); or

- The string contains somewhere the substring TAG; or

- The string contains somewhere the substring TGA, followed by a quote.

Hence, the sequence ABCDTAGFGH would match, but the sequence AB_TAG_TGA would not match.

The second problem is that grep returns not the matching parts, but the whole strings where the match occured.

In your case, I would simply loop over @rna, check which line matches, and then output the matching part. If your really want to be fancy, you can do it also like this:


Code
print OUTFILE (map {/(ATG.*T(AA|AG|GA))/ ? "$2\n" : ""} @rna);


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives