CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Problem adding specific columns into array

 



hellohello1
Novice

Feb 19, 2014, 6:24 PM

Post #1 of 6 (1743 views)
Problem adding specific columns into array Can't Post

I have columns that contain dataSXXR(X) (e.g. dataS01R01, dataS01R02, dataS02R01, dataS02R02, etc.)

There are other columns headers which will be fixed in all files.

So right now, my problem here is trying to find the header that match dataS0XRx so that I can grab those columns to perform some calculations:


Code
 
e.g.
first file.txt:

ID dataS01R1 dataS01R2 dataS02R1 dataS02R2 Links
M45 345.2 536 876.12 873 http://..
M34 836 893 829 83.234
M72 873 123 342.36 837
M98 452 934 1237 938 http://..

=======================================
Calculation:
row2/row2, row3/row2, row4/row2...row3400/row2
row2/row3, row3/row3, row4/row3 ... row3400/row3
row2/row4, row3/row4 ...row3400/row4

E.g dataS01R1
become:
ID dataS01R1 ..dataS01R02... Links
M45 1 (345.2/345.2) http://..
M34 2.42 (836/345.2)
M72 2.52 (873/345.2)
M98 1.309 (452/345.2) http://..
M45 0.41 (345.2/836) http://..
M34 1 (836/836)
M72 1.04 (873/836)
M98 0.54 (452/836) http://..
.
. (loop through rows as denominator)
.


and then loop through the column, print it out and filter off unwanted rows based on the average Coefficient Variance across all dataSXR0X rows (which I will figure out later after I manage to figure out the beginning part).

So my problem here:
How to find the column headers matching dataS0XR0X to put those columns into arrays for manipulation?

here is my code which I have done:


Code
 if($first) 
{
#if this is the first file, find the column locations
my $firstline = <CURINFILE>; #read in the header line
chomp $firstline;
my @columns = split(/\t/, $firstline);

my $columncount = 0;

while($columncount <= $#columns && !($columns[$columncount] =~ /ID/))
{
$columncount++;
}
$ID= $columncount;

########## Having error here ########
while($columncount <= $#columns && !(($columns[$columncount] =~ /_data/) ))
{

$columncount++;
}

$columns[$columncount] =~ /_dataS(\d+)R/;
my $currentReplicateID = $1;
my $currentReplicateCount = 1;
$ctrlStartCol = $columncount++;


while($columncount <= $#columns)
{
$columns[$columncount] =~ /_dataS(\d+)R/;
my $newReplicateID = $1;
if($newReplicateID ne $currentReplicateID)
{
push(@replicateCount, $currentReplicateCount);
$currentReplicateID = $newReplicateID;
$currentReplicateCount = 1;
}
else
{
$currentReplicateCount++;
}
$columncount++;
}
#add the last replicate in
push(@replicateCount, $currentReplicateCount);

##################################
#read in the remainder of the file
while(<CURINFILE>)
{
#add the id, intensity values to an array
chomp $_;
my @templine = split(/\t/,$_);
my @tempratio = ();
push(@tempratio, $templine[$ID]);

##### Error Here ##################
#add intensities from the samples
my $columnIndex = $ctrlStartCol;
for(my $k = 0; $k <= $i; $k++)
{
$columnIndex += $replicateCount[$k];
}
for(my $j = 0; $j < $replicateCount[$i+1]; $j++)
{
push(@tempratio, $templine[$columnIndex+$j]);
}


This code only print out the first value of each $tempratio[x]. and it has error running when I add in the code for the dataSXXRXX columns and intensities.

I am working on large databases and initially I worked with excel but it is too slow and lag my whole computer when performing calculations, so I decided to try PERL instead as I read that it is good for manipulating large datasets. However I am quite new to PERL, just started two months back. So I am not sure if what I am doing is okay. If there are other suggestions, let me know too.


I hope my explanation is not confusing. :)


(This post was edited by hellohello1 on Feb 19, 2014, 9:56 PM)


Laurent_R
Veteran / Moderator

Feb 22, 2014, 4:24 AM

Post #2 of 6 (1701 views)
Re: [hellohello1] Problem adding specific columns into array [In reply to] Can't Post


Quote
I hope my explanation is not confusing. :)


Well, it seems that nobody dared to answer (including myself when I first say your post) after two days, so maybe it is a bit confusing after all. Wink

Your explanations are actually pretty clear, but your code seems to be very complicated for what seems to be a quite simple problem.

I think you have an error there:


Code
while($columncount <= $#columns && !(($columns[$columncount] =~ /_data/) ))


the $columcount variable has not been reset and therefore has the value which which it left the preceding loop, which is probably not what you want. Assuming I am understanding correctly what you are trying to do, $columncount should probably reset to 0 before entering this loop (or, probably even better, to make it a variable lexically scoped to the loop). Another point in this line is that, IMHO, it would be clearer to use the !~ operator rather than negating the =~operator:


Code
while($columncount <= $#columns and  $columns[$columncount] !~ /_data/ )


But, as I said, your code seems to be far too complicated and I do not understand some of the things that you are doing or why you are doing them. Therefore, I do not wish to go too much into your code, but I will try to write something afresh from your initial explanations if I have time later today (and don't forget about it).


Laurent_R
Veteran / Moderator

Feb 22, 2014, 3:51 PM

Post #3 of 6 (1690 views)
Re: [hellohello1] Problem adding specific columns into array [In reply to] Can't Post

Hi,

OK, I took some time to look at your problem, but it turns out that there is not enough information on what you really need.

I can only give you some clues. What you probably need to do first is to load your file into a data structure in memory. I decided to use an array of arrays (AoA), although, depending on your exact needs, an array of hashes (AoH), a hash of arrays (HoA) or a hash of hashes (HoH) might turn out to be more practical. But any of these two-dimensional structures can do what you need.

This is the code:


Code
use strict; 
use warnings;
use Data::Dumper;

my @full_data;
my $header_line = <DATA>;
my @headers = split /\s+/, $header_line;
while (<DATA>) {
my @temp_array = split /\s+/, $_;
push @full_data, \@temp_array;
}

for my $arr_ref (@full_data) {
print "@$arr_ref \n";
}

__DATA__
ID dataS01R1 dataS01R2 dataS02R1 dataS02R2 Links
M45 345.2 536 876.12 873 http://..
M34 836 893 829 83.234
M72 873 123 342.36 837
M98 452 934 1237 938 http://..


Storing the file into a data structure is done within the while loop, so just 4 lines of code. I made it in several steps to make the process clear, but it could actually be done within just one single code line with this:


Code
push @full_data, [split /\s+/, $_] while (<DATA>);


Next, the for loop is just there to print the data structure and check that everything is OK. This prints out this:


Code
M45 345.2 536 876.12 873 http://.. 
M34 836 893 829 83.234
M72 873 123 342.36 837
M98 452 934 1237 938 http://..


So, this looks OK. If you want a better idea of what the data structure looks like, you can use the Dumper function of the Data::Dumper module that has been loaded at the beginning. You might get a printout similar to this:

Code
0  ARRAY(0x80070498) 
0 ARRAY(0x801f9938)
0 'M45'
1 345.2
2 536
3 876.12
4 873
5 'http://..'
1 ARRAY(0x8035c788)
0 'M34'
1 836
2 893
3 829
4 83.234
2 ARRAY(0x8006c0e0)
0 'M72'
1 873
2 123
3 342.36
4 837
3 ARRAY(0x8036a168)
0 'M98'
1 452
2 934
3 1237
4 938
5 'http://..'


From now on, I'll have to make some guesses about what you need, because your description of the requirement is incomplete and the example is at variance with the description. First, replicating your examples:


Code
for my $arr_ref1 (@full_data) { 
for my $arr_ref2 (@full_data) {
print $$arr_ref2[0], "\t", $$arr_ref2[1]/$$arr_ref1[1], "\n";
}
}


This prints the following result:

Code
M45     1 
M34 2.42178447276941
M72 2.52896871378911
M98 1.30938586326767
M45 0.412918660287081
M34 1
M72 1.04425837320574
M98 0.54066985645933
M45 0.395418098510882
M34 0.957617411225659
M72 1
M98 0.517754868270332
M45 0.763716814159292
M34 1.84955752212389
M72 1.93141592920354
M98 1


which is what you describe in your example (except for the rounding of the values, but that is quite easy to fix (I'll do that below).

Again a mere four lines of code.

Now, my wild guess of what you probably want, i.e. the same thing as above, but for each numerical column. The full program again:

Code
use strict; 
use warnings;
use Data::Dumper;

my @full_data;
my $header_line = <DATA>;
my @headers = split /\s+/, $header_line;

push @full_data, [split /\s+/, $_] while (<DATA>);

for my $arr_ref1 (@full_data) {
for my $arr_ref2 (@full_data) {
print $$arr_ref2[0], "\t";
for my $index (1..4) {
printf "%.2f%s", $$arr_ref2[$index]/$$arr_ref1[$index], "\t";
}
print "$$arr_ref2[5]\n";
}
}

__DATA__
ID dataS01R1 dataS01R2 dataS02R1 dataS02R2 Links
M45 345.2 536 876.12 873 http://..
M34 836 893 829 83.234 http://..
M72 873 123 342.36 837 http://..
M98 452 934 1237 938 http://..

If you discard the boiler plate code and the data section, the real code is taking really only 11 lines.
This is the output:

Code
M45     1.00    1.00    1.00    1.00    http://.. 
M34 2.42 1.67 0.95 0.10 http://..
M72 2.53 0.23 0.39 0.96 http://..
M98 1.31 1.74 1.41 1.07 http://..
M45 0.41 0.60 1.06 10.49 http://..
M34 1.00 1.00 1.00 1.00 http://..
M72 1.04 0.14 0.41 10.06 http://..
M98 0.54 1.05 1.49 11.27 http://..
M45 0.40 4.36 2.56 1.04 http://..
M34 0.96 7.26 2.42 0.10 http://..
M72 1.00 1.00 1.00 1.00 http://..
M98 0.52 7.59 3.61 1.12 http://..
M45 0.76 0.57 0.71 0.93 http://..
M34 1.85 0.96 0.67 0.09 http://..
M72 1.93 0.13 0.28 0.89 http://..
M98 1.00 1.00 1.00 1.00 http://..



hellohello1
Novice

Feb 23, 2014, 7:18 PM

Post #4 of 6 (1667 views)
Re: [Laurent_R] Problem adding specific columns into array [In reply to] Can't Post

My apologies for the unclear explanation and wow! The output you come up is what I want! :)

Let me read through and digest the information as I am quite new to perl so am unfamiliar with some of the terms. I will reply here again once I understand your reply and tried out the code again.

May I ask if there's any difference in using AoA, HoA and HOH, or they serve the same purpose?


(This post was edited by hellohello1 on Feb 23, 2014, 7:20 PM)


Laurent_R
Veteran / Moderator

Feb 23, 2014, 11:22 PM

Post #5 of 6 (1650 views)
Re: [hellohello1] Problem adding specific columns into array [In reply to] Can't Post

You would chose between AoA, HoA, AoH and HOH on essentially the same criteria would would use for chosing an array or a hash: basicall, what kind of data access do you need, which type of key do you have, do you need to keep a special order, do you have duplicate keys, etc.

Otherwise all four enable you to manage two-dimensional data.


hellohello1
Novice

Feb 24, 2014, 1:40 AM

Post #6 of 6 (1646 views)
Re: [Laurent_R] Problem adding specific columns into array [In reply to] Can't Post

I see. Thanks for your guidance! I will go back and digest everything to be clear before coming back again to clarify if I have any doubts. Thanks a lot ! :)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives