CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
fasta mRNA selector

 



muluwork
Novice

Mar 15, 2012, 11:28 AM

Post #1 of 6 (863 views)
fasta mRNA selector Can't Post

Hello everybody,
I am looking for a perl script which can select nucleotide sequences that contains multiple A's at the end the sequence.

e.g.
Input
>seq 1
GCGCTTCAAAGAGACAGCGGGACGCGCGGAGCTCCGGCCCATGTAGAGCGGCCAGCGGCCATGGCCACGCTGGAGGGCTGCCGCTGCCGGGGTGCCAGCGGCCGGAACAACAACAGCATCCTCTACAGCATTTTGAAGAG

>seq 2
AGTGGACTGAGATGAATTCAGCAGGTTCAGCTGTCAGATGGACTGAAATGCTCCAAGATTGTATTGGTTGGCAGGTGAGGACGATGTGTGGGACTCAATTAATTAAAGAGAAATGTATTTGGGAAATCTTCAGACTGTGAAATGACCAACAAAAGCAAGTGTGAGTGTGCATGTGAGAATGAGCGGCACAGAGAAAGAAAGAGTGTGCTGAGGGAAGAATATGTCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>seq 3
GAGAGGCGCACCAGGCTCTAAACGAGCACGTCAGGCTGATCCATCGCGAGGACACGACGCGGTTCGCCAAGCTGCTCATAGCTCTGTCCATGCTGAGGGCCATCAGCCCGCCAGTGGTCGCTCAGCTCTTCTTCAGACCC

I need the a script which extracts only those containing AAAAAAAAAAAA at the end, like seq 2

The output should look like this
>seq 2
AGTGGACTGAGATGAATTCAGCAGGTTCAGCTGTCAGATGGACTGAAATGCTCCAAGATTGTATTGGTTGGCAGGTGAGGACGATGTGTGGGACTCAATTAATTAAAGAGAAATGTATTTGGGAAATCTTCAGACTGTGAAATGACCAACAAAAGCAAGTGTGAGTGTGCATGTGAGAATGAGCGGCACAGAGAAAGAAAGAGTGTGCTGAGGGAAGAATATGTCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Thanks a lot!
By the way it is not a home work and I am just a biologist who knows few things about scripting. I only know how to run them :)


naven8
Novice

Mar 15, 2012, 1:47 PM

Post #2 of 6 (855 views)
Re: [muluwork] fasta mRNA selector [In reply to] Can't Post

How will you get this input? Is it form a file?

If Yes:

Code
#!/home/utils/perl-5.8.8/bin/perl -w  
use strict;
use warnings;

my $filename = "File.txt";

my $opfile = "op.txt";

open my $fh,'<',$filename or die "Unable to open the file $filename $!";
open my $OP,'>',$opfile or die "Unable to write $!";

while(<$fh>){
if(/A{50,}$/) {
print $OP $_;
}
}

close($fh);
close($OP);


Please change "A{50,}$" based on your requirement.


(This post was edited by naven8 on Mar 15, 2012, 1:48 PM)


muluwork
Novice

Mar 16, 2012, 6:29 AM

Post #3 of 6 (830 views)
Re: [naven8] fasta mRNA selector [In reply to] Can't Post

Thanks Naven8! It only take the sequence I need to include the sequence name also
In addition some of the sequences are not in a single line


In Reply To


naven8
Novice

Mar 16, 2012, 7:37 AM

Post #4 of 6 (822 views)
Re: [muluwork] fasta mRNA selector [In reply to] Can't Post

How your input will look like?
And what is the pattern that you need to check?
What is the o/p that you need?
Can you give some eg?


muluwork
Novice

Mar 16, 2012, 7:50 AM

Post #5 of 6 (820 views)
Re: [naven8] fasta mRNA selector [In reply to] Can't Post

The input file looks like this
>gi|91063272|
GCACGAGGCGTCATACGTGCACACAAGAGCATCATGGAGTGTGCCGTCTTCCCCTGGACCAACCTGCTGG
TGGTGGTGGTGGAGCTGGAGGGCTCCGAGCAGGAAGCCCTGGACCTGGTTCCCATGGTGACCAAGGCGGT
GCTGGAGGAGCACTACCTGATTGTGGGCGTGGTCGTGGTGACGGACATCGGCGTCATCCCTATCAACTCT
CGCGGCGAGAAACAGCGCATGCACCTCCGCGACGGCTTCTTACAAGACCAGCTCGACCCCATCTACGTGG
CCTACAACATGTAGCCTGAGTGTGTGTGTGTGTGAGAGCGAGCATGCGTTTGTATGCCGTGCATATCGCC
GTTGTTGTTCATTCATTTCCCAAAATCAAGATGTACTGTATTTTTGAACCTGCATATGGTGAAGTCTT

>gi|91063271
GCACGAGGGATGGTTCCTCCAGTCGCAGTTTCTCCGCTCATCAAGACGGCCAGGTGGTCTGCCCTGCTGC
TCGGCGTCTTCTATGGCAAACAGAGGTTTGACTACCTAAAGCCCATTGCTGAGGAGGAGAGGAAGGTTGA
GGAGGCGGAGAAAATGGCCAGAGAGGAGCAGGAACGCATCTACAAACAGCTGTCAGAAGCAAATTCTGAA
ACCATCCTCAAGTGATGTATCTGTGGACCTGTGTCCGTGTGTTTCAAACCCAATAAAACTTATTTTTCAT
GT

>gi|91063270
GCACGAGGGAAGGCTCTCTCTAACATGACCCAGCAGTACAGTGCTGGACAAAAGACTGACAGCAGAAAAG
GAGGCAAGAAGCAAACCGAGAGGGAGAAGAAGAAGAAGATCCTGGCTGAACGCAGGAAGGCTTTCAACAT
TGATCATCTGAACGAAGATAAACTCAAGGAGAAGGCCACGGAGCTGTGGCAGTGGCTGATGGGTCTGGAA
GCTGAGAAGTTTGACCTCAGTGAGAAACTCAAAAAACAGAAATATGATATTAACCAGCTTCTTGCTCGAG
TCAAGGATCACCAGAGTGCCAAAGGTCGTGGCAAGGGTAAGCTGGGCGGCCGGCTGAGGTAGAGCAGCTT
CAGGACGAGGACGAGGCGGACCGAGACACAAGGCGGTCGGCCCATGTGTCTGATGTATATTCATTGTTCT
GTTGTGTTGGATTACGTGTCACTAAACATGATATCATAACATTAAATAACTCATTACATCCNC

>gi|152212425
ACGCGGGCAAAGGGAGAAGGTTGACCTATAGCTGCAACCCATGGCACCGCAGCAGACTTTAAGTAGCCTT
GTCTTCAGGGACGAGCACGAACTCCGTGGCTGAAGTCCCGAGACTCCCACAGACAAGGACCATGAACAAG
AACAAGCAGCGCCCTGACTACACTGGACCACAGTCCCCATCCAAAGGCCGAAGACCACCCAGGACGCCCA
AGTGCTCCCGGTGTAGGAATCACGGCTTCGTGTCTCCGTTGAAGGGCCACAAACGCTACTGTGACTGGAG
GGAGTGTCGCTGTGACAAGTGTAACCTCATAGCGGAGAGACAGCGAATCATGGCGGCGCAGGTTGCCCTG
AGGAGGCAGCAGGCCCAGGAGGAAGAACTTGGGATTTGTACTCCAGTTGCTGTCAATGGGCCTGAAGTGA
TGGTCAAGAGTGAGTCTAGAGCGGACTGCCTGCTCCCTGTGGAAGGGAGATCCATGCCCTCTTCCATCAG
CACCTCCACTTATGTGCATGCTGGCCAAGGGAGCAGCAGGGCTCATCATGAGGGATCGTCTGACCTTCAG
ATGGAAACCCCCTATTACAACATCTACCAACCATCTCGTTACCTGTACAACTATCAGCAATACCAGATGT
CTCATGGTGATGGCTGCCTGCCGAGCCACAACATGCCCTCTCAGTACTGCATGCATTCTTACTACCCAGC
AACCTCCTACCTGACCCAAGGCCGCAGCTCTGCCACCTACGTTCCTTCCATCTGCAACCTGGAGGACGGC
AACTACGGCAGTAACAACAACTACGCCGAGACCACGGCAGCCTCCGCCTCGTCCAGCGTCGGCCTCACCG
CCGCTCCTGACTTTGCCCTGAACTACACCGTCACCTCCATCGTTTACGGTGAAACAAACAAATAAAGAAA
CCTCACATATAGTCGGATTAAAAAATATATATATATTAGAGAGTTATGCACTAAATTGTTCACTACAAAT
GTTTTAGTTAACTGACGTTTCCGATTACACTCTTCTTTTGCACCTTGTTGCTACTTCACTAGTCTGGATG
TTGTTCATAGTCAATATTCTCACCCCAGCCATATTTAGAAGGTTTTTTGTGCACCATTTTGACTCGACAG
ATTTACAGAGTAAGGGATTGTTTTTTTTTTTTTTATTGTCAGGTTATATTTTCCTGTGCCTTTTAAAAAC
ATTAGACAGACCCGAACAAAAAAAAAAAAAAAAAAAAAAAAA

The output shall be like this
>gi|152212425
ACGCGGGCAAAGGGAGAAGGTTGACCTATAGCTGCAACCCATGGCACCGCAGCAGACTTTAAGTAGCCTT
GTCTTCAGGGACGAGCACGAACTCCGTGGCTGAAGTCCCGAGACTCCCACAGACAAGGACCATGAACAAG
AACAAGCAGCGCCCTGACTACACTGGACCACAGTCCCCATCCAAAGGCCGAAGACCACCCAGGACGCCCA
AGTGCTCCCGGTGTAGGAATCACGGCTTCGTGTCTCCGTTGAAGGGCCACAAACGCTACTGTGACTGGAG
GGAGTGTCGCTGTGACAAGTGTAACCTCATAGCGGAGAGACAGCGAATCATGGCGGCGCAGGTTGCCCTG
AGGAGGCAGCAGGCCCAGGAGGAAGAACTTGGGATTTGTACTCCAGTTGCTGTCAATGGGCCTGAAGTGA
TGGTCAAGAGTGAGTCTAGAGCGGACTGCCTGCTCCCTGTGGAAGGGAGATCCATGCCCTCTTCCATCAG
CACCTCCACTTATGTGCATGCTGGCCAAGGGAGCAGCAGGGCTCATCATGAGGGATCGTCTGACCTTCAG
ATGGAAACCCCCTATTACAACATCTACCAACCATCTCGTTACCTGTACAACTATCAGCAATACCAGATGT
CTCATGGTGATGGCTGCCTGCCGAGCCACAACATGCCCTCTCAGTACTGCATGCATTCTTACTACCCAGC
AACCTCCTACCTGACCCAAGGCCGCAGCTCTGCCACCTACGTTCCTTCCATCTGCAACCTGGAGGACGGC
AACTACGGCAGTAACAACAACTACGCCGAGACCACGGCAGCCTCCGCCTCGTCCAGCGTCGGCCTCACCG
CCGCTCCTGACTTTGCCCTGAACTACACCGTCACCTCCATCGTTTACGGTGAAACAAACAAATAAAGAAA
CCTCACATATAGTCGGATTAAAAAATATATATATATTAGAGAGTTATGCACTAAATTGTTCACTACAAAT
GTTTTAGTTAACTGACGTTTCCGATTACACTCTTCTTTTGCACCTTGTTGCTACTTCACTAGTCTGGATG
TTGTTCATAGTCAATATTCTCACCCCAGCCATATTTAGAAGGTTTTTTGTGCACCATTTTGACTCGACAG
ATTTACAGAGTAAGGGATTGTTTTTTTTTTTTTTATTGTCAGGTTATATTTTCCTGTGCCTTTTAAAAAC
ATTAGACAGACCCGAACAAAAAAAAAAAAAAAAAAAAAAAAA


naven8
Novice

Mar 16, 2012, 9:13 AM

Post #6 of 6 (813 views)
Re: [muluwork] fasta mRNA selector [In reply to] Can't Post

Then Add following line after the use warnings;


Code
local $/ = "";


Please check whether you have space at the end of line*.
* end of the paragraph.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives