CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
extraction of data from a particular column

 



nandini_bn
Novice

Jul 8, 2011, 12:51 AM

Post #1 of 4 (3530 views)
extraction of data from a particular column Can't Post

Hello,
I need some help with regular expression, I have a file with some huge data with loads of columns which looks like this
Ref Context Base
A CA[A]TG AA
T GA[T]CC AT
G CC[G]GC GC
C AA[C]AC CC

so now i need to extract the data where Base does not have the same bases, so i need something where all the AT and GC from Base gets extracted and stores in another file along with all the other columns.
any suggestions ?


BillKSmith
Veteran

Jul 8, 2011, 6:59 AM

Post #2 of 4 (3520 views)
Re: [nandini_bn] extraction of data from a particular column [In reply to] Can't Post


Code
#!perl -p 
if ((split /\t/, $_)[2] =~ /(?:AA|GG|CC|TT)/){
$_ = <>;
redo;
}



Note: If the last line is not printed (as in the sample case), it is necessary to type an end-of-file to terminate the program.
Good Luck,
Bill


nandini_bn
Novice

Jul 8, 2011, 11:25 AM

Post #3 of 4 (3517 views)
Re: [BillKSmith] extraction of data from a particular column [In reply to] Can't Post

Thank you, Bill. I just had one doubt. The column Base, is the 3rd column, what does [2] signify in the script ?


BillKSmith
Veteran

Jul 8, 2011, 12:58 PM

Post #4 of 4 (3513 views)
Re: [nandini_bn] extraction of data from a particular column [In reply to] Can't Post

Split creates an array of fields. The '2' is a subscript into that array. (By default, subscripts start at zero. The '2' refers to the third field.) The whole test could be done with a regular expression, but extracting the required field with split makes it much easier. If your files are so huge that processing speed is important, you probably should implement it both ways to find which is faster.


Code
#!perl -p 
if (/([CAGT])\1\s*$/){
$_ = <>;
redo;
}



The perl -p is not a good idea for production software, but it is an easy way for me to show you a complete program that does the processing that you asked for.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives