CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Search pattern and query formation doubt

 



newtoperlprog
Novice

Jul 24, 2014, 1:21 PM

Post #1 of 12 (805 views)
Search pattern and query formation doubt Can't Post

Dear All,
I am designing a filter against a string of sequence (19 letter longs) in a while loop and have formed this filter:

Code
if (      ($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/)  
&& ($gcper >= 30 && $gcper <= 52))
{
print "$seq\t$seqpos\t$gcper\n";
}


So basically, I want to match 'A' at position 3, 'T' at position 10, [ACT] at position 13, [AT] at position 19 and atleast 3 A's or 3 T's from position 15-19 and the $gcper should be in between 30-52.

I checked the result and it seemed to work, i need help in wheather this code writing is ok or i can improve it better.

Another thing which I want to check is: no GC stretch more than 9 letters long, but I don't know how I can insert that check in the above code.?

Code
Datafile: 
GCAGGTGGATCTATTTCAT 3201-3220 42.11
TAAGAGGTGTTATTTGGAA 3268-3287 31.58
ATACGATGCTTCAAGAGAA 3346-3365 36.84
CAAGCTCATCATACTGGCT 1201-1220 47.37
GGTACTGACTTTGCTTGCT 2923-2942 47.37
CGTAGTGTTAAGTTATAGT 3003-3022 31.58
GTATGGGTAGGGTAAATCA 3248-3267 42.11
CCTGCTGTGATACGATGCT 3337-3356 52.63
CCTGCGCGCGCGCGATGCT 3300-3318 50.63

Thank you for your help.


(This post was edited by Laurent_R on Jul 25, 2014, 9:54 AM)


BillKSmith
Veteran

Jul 25, 2014, 6:58 AM

Post #2 of 12 (785 views)
Re: [newtoperlprog] Search pattern and query formation doubt [In reply to] Can't Post

Your code and text agree for positions 1 through 14 and 19. The remaining field would be difficult to do with a regular expression. It definitely cannot be done with a single character class. There are twenty different patterns which meet your descriptions of positions 15-19. I cannot think of an implementation that does not involve explicitly testing for each one. I can help you with this if no one else has a better idea.

I do not understand your new requirement. I have no idea what you mean by "GC stretch".

Note: Unfortunately the [perl][/perl] tags do not work as advertised on this site. Please use 'code' instead of 'perl'.
Good Luck,
Bill


newtoperlprog
Novice

Jul 25, 2014, 7:11 AM

Post #3 of 12 (781 views)
Re: [BillKSmith] Search pattern and query formation doubt [In reply to] Can't Post

Dear BillKSmith,

Thank you for your reply and thoughts.

Sorry for not explaining the 'GC' stretch properly in my question and it means that in the string of 19 nucleotide, 'GC' ,taken together, should not occur continuously till 9 nucleotide. For example, GCGCGCGCGC has 'GC' stretch of 10 nucleotides and it should be filtered by the program.

Hope this helps in explaining the 'GC' stretch in my question.

Thank you for your help.


Laurent_R
Veteran / Moderator

Jul 25, 2014, 10:00 AM

Post #4 of 12 (771 views)
Re: [newtoperlprog] Search pattern and query formation doubt [In reply to] Can't Post

Please note that I edited your post to use "code" tags instead of "perl" tags to make your post more redable.

To detect a repetition of 9 or more GC, somethiung lile this:


Code
my $c = "GC"; 
if (m/$c{9}/) { # do something



(This post was edited by Laurent_R on Jul 25, 2014, 10:26 AM)


newtoperlprog
Novice

Jul 25, 2014, 10:23 AM

Post #5 of 12 (768 views)
Re: [Laurent_R] Search pattern and query formation doubt [In reply to] Can't Post

Thank you Laurent_R for your suggestion.
I modified the code accordingly:


Code
	 

my $gcstretch = "GC";

if ( ( $seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/ )
&& ( $gcper >= 30 && $gcper <= 52 )
&& ( $gcstretch !~ /$gcstretch{10}/ ) )
{
print "$seq\t$seqpos\t$gcper\n";
}
else { next; }


One last query: how to match atleast 3 A's or 3 T's from position 15 to 19. Could you please help, if the above regular expression is correct or not?

Thanks


Laurent_R
Veteran / Moderator

Jul 25, 2014, 10:34 AM

Post #6 of 12 (765 views)
Re: [newtoperlprog] Search pattern and query formation doubt [In reply to] Can't Post


In Reply To
One last query: how to match atleast 3 A's or 3 T's from position 15 to 19. Could you please help, if the above regular expression is correct or not?


Try this:


Code
my $substring = substr $seq, 15, 19; 
if (substring =~ /A.*?A.*?A/ or
substring =~ /T.*?T.*?T/ ) { # do something...


I think it should work, but I haven't tested it thoroughly.


Chris Charley
User

Jul 25, 2014, 12:00 PM

Post #7 of 12 (757 views)
Re: [Laurent_R] Search pattern and query formation doubt [In reply to] Can't Post

Hi Laurent

Since he is counting beginning with 1 (instead of 0 for substr), I think that would be

my $substring = substr $seq, 14, 5;

(substr takes the beginning index and the number of elements, not the last position) :-)


newtoperlprog
Novice

Jul 25, 2014, 12:22 PM

Post #8 of 12 (753 views)
Re: [Chris Charley] Search pattern and query formation doubt [In reply to] Can't Post

Dear All,
Thank you for all the suggestions.

So I guess, this would be the way to combine the result from $seq and $substring and filter the results?

Code
 
if ( ( $seq =~ /\w{2}A\w{6}T\w{2}[ACT]\w{5}[AT]/ ) && ( $gcper >= 30 && $gcper <= 52 ) )
{
my $substring = substr ($seq, 14, 5);
if ($substring =~ /A.*?A.*?A/ || $substring =~ /T.*?T.*?T/)
{
print "$seq\t$seqpos\t$gcper\n";
}
}
else { next; }


The $substring is part of $seq which will match the criteria of atleast 3 A's or 3 T's in the $seq from position 15-19.


BillKSmith
Veteran

Jul 25, 2014, 1:11 PM

Post #9 of 12 (745 views)
Re: [newtoperlprog] Search pattern and query formation doubt [In reply to] Can't Post

Laurent,

I think you misunderstood the requirement here. There cannot possibly be ten 'GC' pairs in 19 character string. I think the requirement is 10 characters (i.e. 5 pair).

I like your solution to the three 't' problem. My approach was getting much to complicated.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Jul 26, 2014, 3:56 AM

Post #10 of 12 (693 views)
Re: [Chris Charley] Search pattern and query formation doubt [In reply to] Can't Post


In Reply To
Hi Laurent

Since he is counting beginning with 1 (instead of 0 for substr), I think that would be

my $substring = substr $seq, 14, 5;

(substr takes the beginning index and the number of elements, not the last position) :-)


Yes, you're right, Chris, it should start at position 14. Just about each time I use substr I need to make a quick test under the debugger to check whether I should start counting at 0 or 1. Here I did not test it and I predictably got it wrong.


Laurent_R
Veteran / Moderator

Jul 26, 2014, 4:03 AM

Post #11 of 12 (691 views)
Re: [BillKSmith] Search pattern and query formation doubt [In reply to] Can't Post


In Reply To
Laurent,

I think you misunderstood the requirement here. There cannot possibly be ten 'GC' pairs in 19 character string. I think the requirement is 10 characters (i.e. 5 pair).


Yes, Bill, you're obviously right, there cannot be 10 pairs of letters in a 19-character string; also, looking back at what the OP said, it was clearly 10 nucleotides, not 10 pairs, so it should be 5 pairs, but the general idea of the solution is still basically right.


newtoperlprog
Novice

Jul 29, 2014, 12:50 PM

Post #12 of 12 (573 views)
Re: [Laurent_R] Search pattern and query formation doubt [In reply to] Can't Post

Dear All,

Thank you very much for your suggestions and directions to write efficient code in perl.

Regards Smile

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives