CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Extrating fasta sequences?

 



lily3344
New User

Dec 15, 2008, 10:07 PM

Post #1 of 4 (3072 views)
Extrating fasta sequences? Can't Post

ok...I need to extract some sequences from a fasta format file which look like this

>FBpp0100000
MKATSCRPIPFINFRIVRYFYRPRRLNKYYDIYRIAALVRDRACLFTCQQ
ASQNLQQQQPRFYSAPGRRAGFFSQFFDNMKAEMDKNKEIKDNIRKFREE
AQKLEESDALKSARQKFNIVESEAQKSSSMLKEQLGAIKERVGDVLEDAS
KSHLAKKVTEELSKKARGVSDTISDTSGKLGQTSAFQAISNTTTTIKKEM
DSASIENRVYRAPAKLRKRVQLVMSDSDRVVEPNTEATGMELHKDSKFYE
SWENFKNNNTYVNKVLDWKVKYDESENPVIRASRLLTDKVSDVMGGLFSK
TELSETMTELVKIDPSFDQKDFLRDCETDIIPNILESIVRGDLEILKDWC
FESTFNIIANPIKEAKKAGVYLDSKILDIENIELAMGKVMEQGPVLIITF
QAQQIMCVRDQKSQVVEGDPEKVMRVHYVWVLCRDRNELNPKAAWRLMEL
SANSSEQFV

>FBpp0100001
MRRVPPTDAEMQPNRARFKKYNVWASALQEDALSENMRGCDVTRSGRDRN
VENYDFSLRYRLNGENTLKRRLSNSSEDGGECSHPAHKRGRPSSRPITGN
QQRGLVKSRTGHRSRRGTSSASGSSDFCEPRHILDLNEVGERDPSDVATE
MASKLYEEKDELLVRVVEVLGIDVCLELYKETQRIEADGGMMIKNGIRRR
TPGGVFLFLIKHHDNITQEQQKRIFSEDRQSLSKSRKQIETLMRDRKVEE
LKKCLSKQVTELPTLNQRKEYYMQGDEQSEDKQPGSLSNPPPSPVGAEQE
HDSPEYRTHEININLVDNAELPSTSKAAAAAQGAPLKDLISYDHDFLDVN
CGDMDFF






suppose I just want one of the sequences, I tried the following:

sub readlines {
my @line;
my @results = ();
my $proteins = @proteins;


while (my $line = readline ($FILE)) {

if ($line =~ /$proteins/) {
next until $line =~ /^>/;

}elsif ($line =~ tr/$proteins//cd) {

my $lines .= $line;
}

} return ($lines);
}
}

which doesn't work, but its not giving me a feed back or error ....is there possibility that this actually works, but just that my computer is slow?

the fasta file is about 11MB, can someone help me out?


KevinR
Veteran


Dec 15, 2008, 10:14 PM

Post #2 of 4 (3070 views)
Re: [lily3344] Extrating fasta sequences? [In reply to] Can't Post

whats this line supposed to do?


Code
my $proteins = @proteins;


$proteins will be an integer equal to the length of the @proteins array.

So this line


Code
if ($line =~ /$proteins/) {


Will be evaluated something like this:


Code
if ($line =~ /2/) {


where 2 would be the actual length of the @proteins array
-------------------------------------------------


lily3344
New User

Dec 15, 2008, 10:17 PM

Post #3 of 4 (3068 views)
Re: [KevinR] Extrating fasta sequences? [In reply to] Can't Post

sorry, the

@proteins = @proteins are actually from another module which contains the key ids for searching

like the >FBpp.. at the hedliners.

and what I am trying to do is the take out only the ones that match the ids (include sequences at the bottom)


KevinR
Veteran


Dec 15, 2008, 10:28 PM

Post #4 of 4 (3065 views)
Re: [lily3344] Extrating fasta sequences? [In reply to] Can't Post

assuming $FILE points to an open filehandle here is one possibility:


Code
my $protein = 'FBpp0100000'; 
my $match;
my $flag = 0;
while (my $line = readline ($FILE)) {
if ($line =~ /^\s*$/) {
$flag = 0;
next;
}
if ($line =~ /^>$protein/) {
$flag = 1;
next;
}
$match .= $line if $flag;
}
print $match;

-------------------------------------------------

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives