CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
need a hand with this one

 



bingo11
New User

Mar 17, 2005, 1:45 PM

Post #1 of 7 (5430 views)
need a hand with this one Can't Post

I am trying to find all occurrences of A's with a maximum 2 non A characers (either G, C or T) in a string and i am particularly interested by the longest occurence.
Howeved when i try this on .. lets say.. the following

my $string = "CGATAAAAAAGAAAAAAGAAAAAAAAAAAAAAAGTCGACCAAAAAAAAGAGATCGATAAGATGAAATAAA";


while ($string=~ /A+([CGT]A+){0,2}/g){
print " Found $& at pos ".(pos($string)-length($&))."\n";
}
It misses the longest occurence with starts at position 9.

Any suggestions or hints to would be greatly appreciated.

thx
Mahdi


MrPJ
User

Mar 17, 2005, 5:11 PM

Post #2 of 7 (5427 views)
Re: [bingo11] need a hand with this one [In reply to] Can't Post

The regex won't work because your first match takes you all the way up to the third G, so the next match will be all the A's. I would think to get it to work you'd need to go back to the last non-A after each match by resetting pos().

I'll try a few ideas and get back to you.


(This post was edited by MrPJ on Mar 17, 2005, 5:19 PM)


MrPJ
User

Mar 17, 2005, 5:30 PM

Post #3 of 7 (5421 views)
Re: [bingo11] need a hand with this one [In reply to] Can't Post

I have to get to bed now, but with a few minutes playing I ended up with:


Code
while ($string=~ /(A+(([CGT]A+){0,2}))/g){  
print " Found $1 at pos ".(pos($string)-length($1))."\n";
pos($string) = pos($string)-length($2);
}


...which gives:

Found ATAAAAAAGAAAAAA at pos 2
Found AAAAAAGAAAAAAGAAAAAAAAAAAAAAA at pos 4
Found AAAAAAGAAAAAAAAAAAAAAA at pos 11
Found AAAAAAAAAAAAAAA at pos 18
Found A at pos 37
Found AAAAAAAAGAGA at pos 40
Found AGA at pos 49
Found A at pos 51
Found ATAAGA at pos 55
Found AAGA at pos 57
Found A at pos 60
Found AAATAAA at pos 63
Found AAA at pos 67

....thats not perfect as it misses out ACCAAAAAAAA, but its the best I can do for now :)


KevinR
Veteran


Mar 17, 2005, 10:45 PM

Post #4 of 7 (5419 views)
Re: [MrPJ] need a hand with this one [In reply to] Can't Post

post removed due to extreme stupidity by the author (moi) Unsure
-------------------------------------------------


(This post was edited by KevinR on Mar 17, 2005, 10:49 PM)


bingo11
New User

Mar 18, 2005, 8:17 AM

Post #5 of 7 (5413 views)
Re: [MrPJ] need a hand with this one [In reply to] Can't Post

Thanks for the tip. As for getting ACCAAAAAAAA, i don't think that would be possible with the condition that i set ([CGT]A+) since there is no A between both C's in ACCAAAAAAAA.

Mahdi


KevinR
Veteran


Mar 18, 2005, 5:56 PM

Post #6 of 7 (5408 views)
Re: [bingo11] need a hand with this one [In reply to] Can't Post

building on what MrPJ already posted, this seems to work well:


Code
my $string = "CGATAAAAAAGAAAAAAGAAAAAAAAAAAAAAAGTCGACCAAAAAAAAGAGATCGATAAGATGAAATAAA";  
while ($string=~ /((A+)([CGT])(A*)([CGT])(A+))/g){
print " Found $1 at pos ".(pos($string)-length($1))."\n";
pos($string) = pos($string)-length("$3$4$5$6");
}


prints:

Found ATAAAAAAGAAAAAA at pos 2
Found AAAAAAGAAAAAAGAAAAAAAAAAAAAAA at pos 4
Found ACCAAAAAAAA at pos 37
Found AAAAAAAAGAGA at pos 40
Found ATAAGA at pos 55
Found ATGAAA at pos 60

although I am not all that clear on what exactly the pattern is supposed to be.
-------------------------------------------------


kencl
User

Mar 26, 2005, 11:28 PM

Post #7 of 7 (5278 views)
Re: [KevinR] need a hand with this one [In reply to] Can't Post

Looks like a strand of DNA

>> If you can't control it, improve it, correlate it or disseminate it with PERL, it doesn't exist!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives