CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
[SOLVED] Merge data in columns B that have same entry in column A

 



Thalakos
Novice

Apr 4, 2013, 11:22 AM

Post #1 of 8 (708 views)
[SOLVED] Merge data in columns B that have same entry in column A Can't Post

Hi all,

I have this text file with two columns A (ID) and B (Gene):


Code
ID             Gene 

hsa-let-7a KRAS
hsa-let-7a HMGA2
hsa-let-7a integrin beta(3)
hsa-let-7a caspase-3
hsa-let-7a PRDM1/Blimp-1
hsa-let-7a HMGA2
hsa-let-7a IGF-II
hsa-let-7a HMGA2
hsa-let-7a HMGA2
hsa-let-7a RAS
hsa-let-7a BCL2
hsa-let-7a RAS
hsa-let-7a MYC
hsa-let-7a CDC25A
hsa-let-7a CDK6
hsa-let-7a NF2
hsa-let-7a c-myc
hsa-let-7a RAS
hsa-let-7a RAS
hsa-let-7a NIRF
hsa-let-7b Cdc34
hsa-let-7b Dicer
hsa-let-7b KRAS
hsa-let-7b CCND1
hsa-let-7b CDC25A
hsa-let-7b CDK6
hsa-let-7b HMGA2
hsa-let-7c HMGA2
hsa-let-7c HMGA2
hsa-let-7c HMGA2
hsa-let-7c BCL2
hsa-let-7c RAS
hsa-let-7c CDC25A
hsa-let-7c CDK6
hsa-let-7c RAS
hsa-let-7d KRAS
hsa-let-7d HMGA2
hsa-let-7d BCL2
hsa-let-7d RAS
hsa-let-7d CDC25A
hsa-let-7d CDK6
hsa-let-7d BDNF
hsa-let-7d D3R
hsa-let-7e HMGA2
hsa-let-7g KRAS
hsa-let-7g HMGA2
hsa-let-7g Ras
hsa-let-7g HMGA2
hsa-let-7g CDC25A
hsa-let-7g CDK6
hsa-miR-1 c-Met
hsa-miR-1 calmodulin
hsa-miR-1 Gata4
hsa-miR-1 Mef2a
hsa-miR-1 BCL2
hsa-miR-1 Gata4
hsa-miR-1 calmodulin
hsa-miR-1 Mef2a
hsa-miR-1 C/EBPa
hsa-miR-1 FoxP1
hsa-miR-1 HDAC4
hsa-miR-1 MET
hsa-miR-1 HCN4
hsa-miR-1 FoxP1
hsa-miR-1 HDAC4
hsa-miR-1 MET
hsa-miR-1 Cdk9
hsa-miR-1 fibronectin
hsa-miR-1 RasGAP
hsa-miR-1 Rheb
hsa-miR-1 MEF-2
hsa-miR-1 nAChR
hsa-miR-1 GAJ1
hsa-miR-1 KCNJ2
hsa-miR-1 HSP60
hsa-miR-1 HSP70
hsa-miR-1 Hand2
hsa-miR-1 Kir2.1
hsa-miR-100 Plk1
......
(line cut)


I would like to have for column A a single entry and in column B the respective associated name comma separated, like that:

Code
ID                     Gene 

hsa-let-7a KRAS,HMGA2,integrin beta(3),caspase-3,PRDM1/Blimp-1,HMGA2,IGF-II,HMGA2,HMGA2,RAS,BCL2,RAS,MYC,CDC25A,CDK6,NF2,c-myc,RAS,RAS,NIRF
hsa-let-7b Cdc34,Dicer,KRAS,CCND1,CDC25A,CDK6,HMGA2
hsa-let-7c HMGA2,HMGA2,HMGA2,BCL2,RAS,CDC25A,CDK6,RAS
.........


Do you know any way to do that automatically?

Thanks in advance,
Giorgio


(This post was edited by Thalakos on Apr 5, 2013, 9:42 AM)


FishMonger
Veteran / Moderator

Apr 4, 2013, 12:10 PM

Post #2 of 8 (702 views)
Re: [Thalakos] Merge data in columns B that have same entry in column A [In reply to] Can't Post

Build a HoA (Hash of Arrays).

Loop over the file and split each line on whitespace. Use the value of the first column as the hash key and use the push function to add the value of the second column to the array. After the hash is built, you can loop over it and use the join function to generate the csv list.


(This post was edited by FishMonger on Apr 4, 2013, 12:14 PM)


Thalakos
Novice

Apr 4, 2013, 12:19 PM

Post #3 of 8 (696 views)
Re: [FishMonger] Merge data in columns B that have same entry in column A [In reply to] Can't Post


In Reply To
Build a HoA (Hash of Arrays).

Loop over the file and split each line on whitespace. Use the value of the first column as the hash key and use the push function to add the value of the second column to the array. After the hash is built, you can loop over it and use the join function to generate the csv list.


Thanks a lot!


Laurent_R
Veteran / Moderator

Apr 4, 2013, 3:59 PM

Post #4 of 8 (688 views)
Re: [Thalakos] Merge data in columns B that have same entry in column A [In reply to] Can't Post

Or you could use a simple hash.

Loop over the file and split each line on whitespace. Use the value of the first column as the hash key and concatenate the value of the second column to the existing value in tha hash element.


Rahul6990
Novice

Apr 5, 2013, 12:41 AM

Post #5 of 8 (680 views)
Re: [Thalakos] Merge data in columns B that have same entry in column A [In reply to] Can't Post

This will be helpfull


Code
foreach $var (@array) 
{
($key,$val) = split('\t',$var);
unless(defined($hash{$key})){
$hash{$key}=$val;
}
else{
$hash{$key}.=",$val";
}
}
for (keys %hash)
{
print "\nkey:$_---Val:$hash{$_}\n";
}



FishMonger
Veteran / Moderator

Apr 5, 2013, 6:47 AM

Post #6 of 8 (675 views)
Re: [Thalakos] Merge data in columns B that have same entry in column A [In reply to] Can't Post


Code
#!/usr/bin/perl 

use 5.10.1;
use strict;
use warnings;

my %gene;

open my $fh, '<', 'genes.txt' or die "failed to open genes.txt $!";


while ( my $line = <$fh> ) {
chomp $line;
my($id, $gene) = split /\t/, $line;
push @{$gene{$id}}, $gene;
}
close $fh;

for my $id (keys %gene) {
say "$id\t", join ',', @{$gene{$id}};
}



Kenosis
User

Apr 5, 2013, 8:25 AM

Post #7 of 8 (670 views)
Re: [FishMonger] Merge data in columns B that have same entry in column A [In reply to] Can't Post

Here's another option:


Code
use strict; 
use warnings;

my %hash;
while (<>) {
push @{ $hash{$1} }, $2 if /(\S+)\s+(\S+)/;
}

local $" = ',';
print "$_\t@{ $hash{$_} }\n" for sort keys %hash;


Usage: perl script.pl file.txt [>results.txt]

The last, optional parameter will direct output to a file.

Output on your dataset:


Code
hsa-let-7a	KRAS,HMGA2,integrin,caspase-3,PRDM1/Blimp-1,HMGA2,IGF-II,HMGA2,HMGA2,RAS,BCL2,RAS,MYC,CDC25A,CDK6,NF2,c-myc,RAS,RAS,NIRF 
hsa-let-7b Cdc34,Dicer,KRAS,CCND1,CDC25A,CDK6,HMGA2
hsa-let-7c HMGA2,HMGA2,HMGA2,BCL2,RAS,CDC25A,CDK6,RAS
hsa-let-7d KRAS,HMGA2,BCL2,RAS,CDC25A,CDK6,BDNF,D3R
hsa-let-7e HMGA2
hsa-let-7g KRAS,HMGA2,Ras,HMGA2,CDC25A,CDK6
hsa-miR-1 c-Met,calmodulin,Gata4,Mef2a,BCL2,Gata4,calmodulin,Mef2a,C/EBPa,FoxP1,HDAC4,MET,HCN4,FoxP1,HDAC4,MET,Cdk9,fibronectin,RasGAP,Rheb,MEF-2,nAChR,GAJ1,KCNJ2,HSP60,HSP70,Hand2,Kir2.1
hsa-miR-100 Plk1


The script captures the two columns and builds a hash of arrays (HoA). A local copy of Perl's $" variable is set to a comma, so the elements of the array are printed as comma-separated when the array is interpolated (printed within double-quotes).

Hope this helps!


Thalakos
Novice

Apr 5, 2013, 9:39 AM

Post #8 of 8 (662 views)
Re: [Kenosis] Merge data in columns B that have same entry in column A [In reply to] Can't Post

Thank you so much guys, the scripts both work great!


(This post was edited by Thalakos on Apr 5, 2013, 9:42 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives