Home: Perl Programming Help: Regular Expressions:
extraction of data from a particular column



nandini_bn
Novice

Jul 8, 2011, 12:51 AM


Views: 6407
extraction of data from a particular column

Hello,
I need some help with regular expression, I have a file with some huge data with loads of columns which looks like this
Ref Context Base
A CA[A]TG AA
T GA[T]CC AT
G CC[G]GC GC
C AA[C]AC CC

so now i need to extract the data where Base does not have the same bases, so i need something where all the AT and GC from Base gets extracted and stores in another file along with all the other columns.
any suggestions ?


BillKSmith
Veteran

Jul 8, 2011, 6:59 AM


Views: 6397
Re: [nandini_bn] extraction of data from a particular column


Code
#!perl -p 
if ((split /\t/, $_)[2] =~ /(?:AA|GG|CC|TT)/){
$_ = <>;
redo;
}



Note: If the last line is not printed (as in the sample case), it is necessary to type an end-of-file to terminate the program.
Good Luck,
Bill


nandini_bn
Novice

Jul 8, 2011, 11:25 AM


Views: 6394
Re: [BillKSmith] extraction of data from a particular column

Thank you, Bill. I just had one doubt. The column Base, is the 3rd column, what does [2] signify in the script ?


BillKSmith
Veteran

Jul 8, 2011, 12:58 PM


Views: 6390
Re: [nandini_bn] extraction of data from a particular column

Split creates an array of fields. The '2' is a subscript into that array. (By default, subscripts start at zero. The '2' refers to the third field.) The whole test could be done with a regular expression, but extracting the required field with split makes it much easier. If your files are so huge that processing speed is important, you probably should implement it both ways to find which is faster.


Code
#!perl -p 
if (/([CAGT])\1\s*$/){
$_ = <>;
redo;
}



The perl -p is not a good idea for production software, but it is an easy way for me to show you a complete program that does the processing that you asked for.
Good Luck,
Bill