CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Problem with printing continuous minimum and maximum polypurine stretches using sliding window

 



vasumathi
New User

Apr 14, 2017, 6:06 PM

Post #1 of 6 (2087 views)
Problem with printing continuous minimum and maximum polypurine stretches using sliding window Can't Post

Hi All,
I need to extract polypurine stretches with max. 1 pyrimidine residues.
my perl code:

for(my $i=0; $i <=$len; $i++) {

for(my $j=0; $j <=length($data[$i]) - $window; $j += $step){
my $nucltde = substr($data[$i], $j, $window);
while($nucltde =~ /G/g)
{
$countG++
}
#print $nucltde,"\n";
my $Gper=($countG/$window)*100;
#print "$countG\t$Gper\n";
if($Gper >= 50){
while($nucltde =~ /[AG]/g)
{
$countAG++
}
while($nucltde =~ /[CT]/g)
{
$countCT++
}
my $AGper=($countAG/$window)*100;
if(($countAG >=15) && ($AGper >= 46) && ($countCT <=1)){

After counting residues and I applied if condition. I could able to print 30nts stretches only. I am not able to print min. of 15nt stretches and maximum of continuous stretches of any length. For that I tried with join and push command. It prints all or joining the same seq. Could you please guide me to solve this problem.


BillKSmith
Veteran

Apr 15, 2017, 6:44 AM

Post #2 of 6 (2080 views)
Re: [vasumathi] Problem with printing continuous minimum and maximum polypurine stretches using sliding window [In reply to] Can't Post

Welcome to perlguru.

Most of us are not biologists. We have no idea what your are asking. You will receive far more answers if you rephrase your question avoiding the jargon of biology. (When this is not possible, define the term in a way that we can understand) It would also be helpful if you provide several small samples of data and the output you expect from each.

I can suggest several improvements to the code you have shown us:
  • Avoid c-style for loops.

  • A match (without /g) in scalar context returns the number of matches See post #5

  • Use 'next' to simplify your loops.




  • Code
    use strict; 
    use warnings;
    # post=8395l0
    # Updated since original posting
    # $step was not handled correctly
    my @data = ([], []);
    my $window = undef;
    my $step = undef;

    foreach my $string (@data) {
    my $line = 'x' x $step . $string; # Simplifies loop
    SUBSTRING:
    while (length($line = substr $line, $step) >= $window) {
    my $nucltde = $_ = substr $line, 0, $window;
    my $countG = /G/; # Refer Post #5
    my $countCT = /[CT]/;
    my $countAG = /[AG]/;
    my $Gper = ( $countG / $window ) * 100;
    my $AGper= ( $countAG / $window ) * 100;
    unless ( $countAG >=15 and $AGper >= 46 and $countCT <= 1 ) {
    next SUBSTRING;
    }
    ...; # process the step
    }
    }

    Note: Code is untested because I lack data. (It does complile without errors.)


    4/16/2017 UPDATE: Replaced code to Correct use of $step in inner loop.
    Good Luck,
    Bill

    (This post was edited by BillKSmith on Apr 17, 2017, 9:25 AM)


    vasumathi
    New User

    Apr 16, 2017, 8:57 AM

    Post #3 of 6 (2071 views)
    Re: [BillKSmith] Problem with printing continuous minimum and maximum polypurine stretches using sliding window [In reply to] Can't Post

    Thanks a lot.


    vasumathi
    New User

    Apr 16, 2017, 10:45 PM

    Post #4 of 6 (2054 views)
    Re: [vasumathi] Problem with printing continuous minimum and maximum polypurine stretches using sliding window [In reply to] Can't Post

    Hi...
    Sorry for late explanation.
    Input data: filename.txt -> 1)ATGCGATAGAAGCGTAGACGATGGAAGGGAAGGAAGGAGGGAGGAAGCTATT
    2)CGTAGATGATTGATAGAGGGAAGAGGAGAGAGGAAGGGAAGGGAAGGGAGGA
    here polypurine means -> continuous of letters A and G
    pyrinmidine -> C and T letters

    Output format : I want to get polypurine tracts(with G residues >=50% of its length) with max. one pyrimidine residues.
    for example: GATGGAAGGGAAGGAAGGAGGGAGGAAG from sequence 1


    BillKSmith
    Veteran

    Apr 17, 2017, 9:17 AM

    Post #5 of 6 (2044 views)
    Re: [vasumathi] Problem with printing continuous minimum and maximum polypurine stretches using sliding window [In reply to] Can't Post

    Your data revealed that my counting was not correct. The match operator does not do exactly what I thought. With that corrected, I can duplicate your expected output by setting $window = 28 and $step = 1. (It also finds several other matches.) I realize that this is just a shorter version of what you already had last week. I still do not understand your original question.

    Based on your new comments, I suspect that you should be testing $Gper rather than $AGper.




    Code
    use strict; 
    use warnings;
    # post=8395l0
    # Updated since posting
    # $step was not handled correctly
    my @data = (
    'ATGCGATAGAAGCGTAGACGATGGAAGGGAAGGAAGGAGGGAGGAAGCTATT',
    'CGTAGATGATTGATAGAGGGAAGAGGAGAGAGGAAGGGAAGGGAAGGGAGGA',
    );
    my $window = 28;
    my $step = 1;

    foreach my $string (@data) {
    my $line = 'x' x $step . $string; # Simplifies loop
    SUBSTRING:
    while (length($line = substr $line, $step) >= $window) {
    my $nucltde = $_ = substr $line, 0, $window;
    my $countG = length join '', /(G)/g;
    my $countCT = length join '', /([CT])/g;
    my $countAG = length join '', /([AG])/g;
    my $Gper = ( $countG / $window ) * 100;
    my $AGper = ( $countAG / $window ) * 100;
    unless ( $countAG >=15 and $AGper >= 46 and $countCT <= 1 ) {
    next SUBSTRING;
    }
    printf "%25s: %2d %3d%1s %2d\n",
    $nucltde, $countAG, $AGper, '%', $countCT;
    }
    }
    OUTPUT:
    ATAGAGGGAAGAGGAGAGAGGAAGGGAA: 27 96% 1
    AGAGGGAAGAGGAGAGAGGAAGGGAAGG: 28 100% 0
    AGGGAAGAGGAGAGAGGAAGGGAAGGGA: 28 100% 0
    GGAAGAGGAGAGAGGAAGGGAAGGGAAG: 28 100% 0
    AAGAGGAGAGAGGAAGGGAAGGGAAGGG: 28 100% 0
    GAGGAGAGAGGAAGGGAAGGGAAGGGAG: 28 100% 0
    GGAGAGAGGAAGGGAAGGGAAGGGAGGA: 28 100% 0

    Good Luck,
    Bill

    (This post was edited by BillKSmith on Apr 17, 2017, 9:33 AM)


    Chris Charley
    User

    Apr 17, 2017, 9:38 AM

    Post #6 of 6 (2040 views)
    Re: [vasumathi] Problem with printing continuous minimum and maximum polypurine stretches using sliding window [In reply to] Can't Post

    Here is a solution using regular expressions. It gets the subseq you want.


    Code
    #!/usr/bin/perl 
    use strict;
    use warnings;

    my @input = qw/
    ATGCGATAGAAGCGTAGACGATGGAAGGGAAGGAAGGAGGGAGGAAGCTATT
    CGTAGATGATTGATAGAGGGAAGAGGAGAGAGGAAGGGAAGGGAAGGGAGGA
    /;

    for my $seq (@input) {
    my $length = length $seq;
    my $max_len = 0;
    my $max_seq = 'NONE';
    while ($seq =~ /(?=([AG]*[CT]?[AG]*))/g) {
    my $len = length $1;
    if ($len > $max_len && $len >= int($length / 2)) {
    $max_len = $len;
    $max_seq = $1;
    }
    }
    print $max_seq, "\n";
    }


    Output is:

    Code
    GATGGAAGGGAAGGAAGGAGGGAGGAAG 
    GATAGAGGGAAGAGGAGAGAGGAAGGGAAGGGAAGGGAGGA



    (This post was edited by Chris Charley on Apr 17, 2017, 11:09 AM)

     
     


    Search for (options) Powered by Gossamer Forum v.1.2.0

    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives