Home: Perl Programming Help: Regular Expressions:
need a hand with this one



bingo11
New User

Mar 17, 2005, 1:45 PM


Views: 12484
need a hand with this one

I am trying to find all occurrences of A's with a maximum 2 non A characers (either G, C or T) in a string and i am particularly interested by the longest occurence.
Howeved when i try this on .. lets say.. the following

my $string = "CGATAAAAAAGAAAAAAGAAAAAAAAAAAAAAAGTCGACCAAAAAAAAGAGATCGATAAGATGAAATAAA";


while ($string=~ /A+([CGT]A+){0,2}/g){
print " Found $& at pos ".(pos($string)-length($&))."\n";
}
It misses the longest occurence with starts at position 9.

Any suggestions or hints to would be greatly appreciated.

thx
Mahdi


MrPJ
User

Mar 17, 2005, 5:11 PM


Views: 12481
Re: [bingo11] need a hand with this one

The regex won't work because your first match takes you all the way up to the third G, so the next match will be all the A's. I would think to get it to work you'd need to go back to the last non-A after each match by resetting pos().

I'll try a few ideas and get back to you.


(This post was edited by MrPJ on Mar 17, 2005, 5:19 PM)


MrPJ
User

Mar 17, 2005, 5:30 PM


Views: 12475
Re: [bingo11] need a hand with this one

I have to get to bed now, but with a few minutes playing I ended up with:


Code
while ($string=~ /(A+(([CGT]A+){0,2}))/g){  
print " Found $1 at pos ".(pos($string)-length($1))."\n";
pos($string) = pos($string)-length($2);
}


...which gives:

Found ATAAAAAAGAAAAAA at pos 2
Found AAAAAAGAAAAAAGAAAAAAAAAAAAAAA at pos 4
Found AAAAAAGAAAAAAAAAAAAAAA at pos 11
Found AAAAAAAAAAAAAAA at pos 18
Found A at pos 37
Found AAAAAAAAGAGA at pos 40
Found AGA at pos 49
Found A at pos 51
Found ATAAGA at pos 55
Found AAGA at pos 57
Found A at pos 60
Found AAATAAA at pos 63
Found AAA at pos 67

....thats not perfect as it misses out ACCAAAAAAAA, but its the best I can do for now :)


KevinR
Veteran


Mar 17, 2005, 10:45 PM


Views: 12473
Re: [MrPJ] need a hand with this one

post removed due to extreme stupidity by the author (moi) Unsure
-------------------------------------------------


(This post was edited by KevinR on Mar 17, 2005, 10:49 PM)


bingo11
New User

Mar 18, 2005, 8:17 AM


Views: 12467
Re: [MrPJ] need a hand with this one

Thanks for the tip. As for getting ACCAAAAAAAA, i don't think that would be possible with the condition that i set ([CGT]A+) since there is no A between both C's in ACCAAAAAAAA.

Mahdi


KevinR
Veteran


Mar 18, 2005, 5:56 PM


Views: 12462
Re: [bingo11] need a hand with this one

building on what MrPJ already posted, this seems to work well:


Code
my $string = "CGATAAAAAAGAAAAAAGAAAAAAAAAAAAAAAGTCGACCAAAAAAAAGAGATCGATAAGATGAAATAAA";  
while ($string=~ /((A+)([CGT])(A*)([CGT])(A+))/g){
print " Found $1 at pos ".(pos($string)-length($1))."\n";
pos($string) = pos($string)-length("$3$4$5$6");
}


prints:

Found ATAAAAAAGAAAAAA at pos 2
Found AAAAAAGAAAAAAGAAAAAAAAAAAAAAA at pos 4
Found ACCAAAAAAAA at pos 37
Found AAAAAAAAGAGA at pos 40
Found ATAAGA at pos 55
Found ATGAAA at pos 60

although I am not all that clear on what exactly the pattern is supposed to be.
-------------------------------------------------


kencl
User

Mar 26, 2005, 11:28 PM


Views: 12332
Re: [KevinR] need a hand with this one

Looks like a strand of DNA

>> If you can't control it, improve it, correlate it or disseminate it with PERL, it doesn't exist!