CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Working with a table

 



jhli515
Novice

May 20, 2005, 5:46 AM

Post #1 of 5 (536 views)
Working with a table Can't Post



I have just changed the input file for the script to make it simpler. Anything in the cell of the table is not manipulated I put in as ¡°NA¡±.

Flow chart:
  • Input file for the script: 1.txt
  • The PERL script
  • Output file 1 from the script: GQfile.txt


Output file 2 from the script: CheckDuplicateResult.txt

Output file 3 from the script: Format.txt



Details:
  • The input file 1.txt is as attached file here
  • The script I have worked so far is below.
  • I have done the Output file 1 from the script: GQfile.txt (see the attachment). GQfile.txt is to achieve any GQ is not 1, change all the figures in the Allele1 and Allele 2 columns into ¡°¡± nil. See the attachment.
  • CheckDuplicateResult.txt is to check the same sample & marker name in UD1 whether have same Allele1 and Allele 2 or not.( I think I should use hash but cannot do it)


Marker Allele 1 Allele 2 UD1 UD2 UD3 CV

BM4045.3FR 112 112 C3 Dup1 NA

BM4045.3FR 112 112 C3 Dup2 NA

BM4045.3FR 112 112 C3 Dup3 NA

BMS527.3FR 177 177 C3 Dup1 NA

BMS527.3FR 177 177 C3 Dup2 NA

BMS527.3FR 177 177 C3 Dup3 NA



CheckDuplicateResult.txt should give the out put as

Marker Allele 1 Allele 2 UD1 UD2 UD3 NewCol

BM4045.3FR 112 112 C3 Dup2 & Dup3 NA Dup1 is different: Allele2 is 112

BMS527.3FR 177 177 C3 Dup1&Dup2 & Dup3 NA







3. Format.txt: We ¡®ll discuss it later

Thank you

Jin



==========Script =====================

$GenemapperFile ="c:/1.txt";

open (FILE, "<$GenemapperFile") or die "Unable to open the file $GenemapperFile;$!";

$tableheader=readline(FILE); # get rid of headers

while (<FILE>)

{

$size = length $line;

#print "The size of each line is $size\n"; # output size of line

@x=split(/\t/);

push @SampleFile, $x[0];

push @SampleName, $x[1];

push @SampleID, $x[2];

push @RunName, $x[3];

push @Panel, $x[4];

push @Marker, $x[5];

push @Dye, $x[6];

push @SNP, $x[7];

push @Allele1, $x[8];

push @Allele2, $x[9];

push @Size1, $x[10];

push @Size2, $x[11];

push @Height1, $x[12];

push @Height2, $x[13];

push @PeakArea1, $x[14];

push @PeakArea2, $x[15];

push @DataPoint1, $x[16];

push @DataPoint2, $x[17];

push @Mutation1, $x[18];

push @Mutation2, $x[19];

push @AEComment1, $x[20];

push @AEComment2, $x[21];

push @ADO, $x[22];

push @AE, $x[23];

push @OMIT, $x[24];

push @OS, $x[25];

push @SHP, $x[26];

push @OBA, $x[27];

push @SPA, $x[28];

push @SP, $x[29];

push @BIN, $x[30];

push @PHR, $x[31];

push @LPH, $x[32];

push @SPU, $x[33];

push @AN, $x[34];

push @BD, $x[35];

push @DP, $x[36];

push @NB, $x[37];

push @CC, $x[38];

push @OVL, $x[39];

push @XTLK, $x[40];

push @GQ, $x[41];

push @UD1, $x[42];

push @UD2, $x[43];

push @UD3, $x[44];

push @CV, $x[45];



}

close FILE;





#@Matrix=(\@SampleFile,\@SampleName,\@SampleID,\@RunName,\@Panel,\@Marker,\@Dye,\@SNP,\@Allele1,\@Allele2,\@Size1,\@Size2,\@Height1,\@Height2,\@PeakArea1,\@PeakArea2,\@DataPoint1,\@DataPoint2,\@Mutation1,\@Mutation2,\@AEComment1,\@AEComment2,\@ADO,\@AE,\@OMIT,\@OS,\@SHP,\@OBA,\@SPA,\@SP,\@BIN,\@PHR,\@LPH,\@SPU,\@AN,\@BD,\@DP,\@NB,\@CC,\@OVL,\@XTLK,\@GQ,\@UD1,\@UD2,\@UD3,\@CV);

#print "test4 @Matrix\n";





#$MatrixRef=\@Matrix;

#print $MatrixRef;



for ($i=0;$i<@GQ;++$i)

{

if ($GQ[$i]!=1)

{

$Allele1[$i]="";

$Allele2[$i]="";



}

#Test the GQ not =1 the genotype should be nil

$OutputFile1="c:\GQfile.txt";

open File, ">$OutputFile1" or die "Unable to open $OutputFile1:$!";

print File "$UD1[$i]\t"."$Allele1[$i]\t"."$Allele2[$i]\n";

close File;



# create a array to have sample name and marker as well

$UD1marker=$UD1[$i]."-".$Marker[$i];

push @UD1Marker, $UD1marker ;

# create a hash: key: sample name and marker ; value: allele1 and allele2

#$UD1MarkerAllele1{$UD1Marker[$i]}=$Allele1[$i];

#$UD1MarkerAllele2{$UD1Marker[$i]}=$Allele2[$i];





}
Attachments: 1.txt (29.1 KB)
  GQfile.txt (7 B)


KevinR
Veteran


May 21, 2005, 3:09 PM

Post #2 of 5 (525 views)
Re: [jhli515] Working with a table [In reply to] Can't Post

Have you made any progress? Your example was confusing because all the lines are the same:

BM4045.3FR 112 112 C3 Dup1 NA
BM4045.3FR 112 112 C3 Dup2 NA
BM4045.3FR 112 112 C3 Dup3 NA

but in your file one of the lines is different (Dup3/Allele 2 is 114)
-------------------------------------------------


KevinR
Veteran


May 22, 2005, 2:20 PM

Post #3 of 5 (513 views)
Re: [jhli515] Working with a table [In reply to] Can't Post


Quote


I have just changed the input file for the script to make it simpler. Anything in the cell of the table is not manipulated I put in as ¡°NA¡±.


Not sure what any of the above means.



Quote
I have done the Output file 1 from the script: GQfile.txt (see the attachment). GQfile.txt is to achieve any GQ is not 1, change all the figures in the Allele1 and Allele 2 columns into ¡°¡± nil. See the attachment.


Once again I have no idea what that means.


Quote
CheckDuplicateResult.txt is to check the same sample & marker name in UD1 whether have same Allele1 and Allele 2 or not.( I think I should use hash but cannot do it)



I think I understand that part. THis is what I came up with for CheckDuplicateResult.txt, which is not exactly what you requested but is very close:


Code
#!perl 
use strict;
use warnings;
#use Data::Dump qw(dump);
my $GenemapperFile = 'c:/1.txt';
my $CheckDuplicateResult = 'c:/CheckDuplicateResult.txt';
my %markers = ();
open (OUT, "<$GenemapperFile") or die "Unable to open the file $GenemapperFile: $!";
readline(OUT);
while (<OUT>) {
chomp;
my @m = map {$_ eq "" ? 'NA' : $_} (split(/\t/))[5,8,9,42,43,44];
push @{$markers{$m[0]}{$m[3]}{$m[1]}{$m[2]}},$m[4];
}
close(OUT);
#print dump(%markers);
open (IN, ">$CheckDuplicateResult") or die "Unable to open the file $GenemapperFile: $!";
foreach my $Markers (sort keys %markers) {
foreach my $UD1 (sort keys %{$markers{$Markers}}) {
foreach my $a1 (sort keys %{$markers{$Markers}{$UD1}}) {
foreach my $a2 (sort keys %{$markers{$Markers}{$UD1}{$a1}}) {
print IN "$Markers\t$a1\t$a2\t$UD1\t", join("\t",@{$markers{$Markers}{$UD1}{$a1}{$a2}}),"\n";
}
}
}
}
close(IN);


which produces a file like this:


Code
BM4045.3FR 112 112 C3 Dup1 
BM4045.3FR 112 114 C3 Dup2 Dup3
BM4045.3FR 112 or 113 112 or 113 C4 Dup2 Dup3
BM4045.3FR 115 118 C4 Dup1
BM4045.3FR 112 112 C5 Dup1 Dup2
BM4045.3FR 119 119 C6 Dup1 Dup2 Dup3

-------------------------------------------------


jhli515
Novice

May 26, 2005, 4:45 AM

Post #4 of 5 (490 views)
Re: [KevinR] Working with a table [In reply to] Can't Post

Hi Kevin,

Thank you so much for your help. I have tried your script. I am trying to understand how did the scripts mean.



Jin


KevinR
Veteran


May 26, 2005, 11:11 AM

Post #5 of 5 (487 views)
Re: [jhli515] Working with a table [In reply to] Can't Post

the working part of the script is these few lines:


Code
while (<OUT>) {  
chomp;
my @m = map {$_ eq "" ? 'NA' : $_} (split(/\t/))[5,8,9,42,43,44];
push @{$markers{$m[0]}{$m[3]}{$m[1]}{$m[2]}},$m[4];
}


this part:


Code
while (<OUT>) {  
chomp;
my @m = map {$_ eq "" ? 'NA' : $_} (split(/\t/))[5,8,9,42,43,44];


is reading each line of the file, removing newlines from the end (chomp).

(split(/\t/))[5,8,9,42,43,44]; is creating a list by splitting each line on the tabs and returning just the parts of the list you seem to want.

map {$_ eq "" ? 'NA' : $_} processes each element of the list returned by the split function and checks to see if the value is blank $_ eq "" ? and if it is assigns it a value of NA and if not blank just uses the value it has. It then puts all that in the array @m. It's using the ternary operator to do that:

condition ? true part : false part ;

it's like:


Code
if (condition true) { 
do this;
}
else {
do this;
}


The last line:


Code
push @{$markers{$m[0]}{$m[3]}{$m[1]}{$m[2]}},$m[4];    
}


creates a multi-dimensional hash that stores the Dup1,Dup2,Dup3 associated with the markers/alleles from the lines in the table in an array.
-------------------------------------------------

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives