CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Using hash keys to separate data

 



a217
Novice

Jun 28, 2011, 9:58 PM

Post #1 of 4 (834 views)
Using hash keys to separate data Can't Post

Hello,

I have a file (hashKey.txt) that I would like to use as a list of hash keys in order to separate data from input files (e.g. testReg.txt).

hashKey.txt

Code
chr10 
chr10_random
chr11
chr11_gl000202_random
chr11_random
chr12
chr13
chr13_random
chr14
chr15
chr15_random
chr16
chr16_random
chr17_ctg5_hap1
chr17
chr17_gl000203_random
chr17_gl000204_random
chr17_gl000205_random
chr17_gl000206_random
chr17_random
chr18
chr18_gl000207_random
chr18_random
chr19
chr19_gl000208_random
chr19_gl000209_random
chr19_random
chr1
chr1_gl000191_random
chr1_gl000192_random
chr1_random
chr20
chr21
chr21_gl000210_random
chr21_random
chr22
chr22_h2_hap1
chr22_random
chr2
chr2_random
chr3
chr3_random
chr4_ctg9_hap1
chr4
chr4_gl000193_random
chr4_gl000194_random
chr4_random
chr5
chr5_h2_hap1
chr5_random
chr6_apd_hap1
chr6_cox_hap1
chr6_cox_hap2
chr6_dbb_hap3
chr6
chr6_mann_hap4
chr6_mcf_hap5
chr6_qbl_hap2
chr6_qbl_hap6
chr6_random
chr6_ssto_hap7
chr7
chr7_gl000195_random
chr7_random
chr8
chr8_gl000196_random
chr8_gl000197_random
chr8_random
chr9
chr9_gl000198_random
chr9_gl000199_random
chr9_gl000200_random
chr9_gl000201_random
chr9_random
chrM
chrUn_gl000211
chrUn_gl000212
chrUn_gl000213
chrUn_gl000214
chrUn_gl000215
chrUn_gl000216
chrUn_gl000217
chrUn_gl000218
chrUn_gl000219
chrUn_gl000220
chrUn_gl000221
chrUn_gl000222
chrUn_gl000223
chrUn_gl000224
chrUn_gl000225
chrUn_gl000226
chrUn_gl000227
chrUn_gl000228
chrUn_gl000229
chrUn_gl000230
chrUn_gl000231
chrUn_gl000232
chrUn_gl000233
chrUn_gl000234
chrUn_gl000235
chrUn_gl000236
chrUn_gl000237
chrUn_gl000238
chrUn_gl000239
chrUn_gl000240
chrUn_gl000241
chrUn_gl000242
chrUn_gl000243
chrUn_gl000244
chrUn_gl000245
chrUn_gl000246
chrUn_gl000247
chrUn_gl000248
chrUn_gl000249
chrX
chrX_random
chrY

hashKey.txt gives a list of all the possible chromosome values there could be in a given input file


testReg.txt

Code
chr1    100    159    0 
chr1 200 260 0
chr1 500 750 0
chr3 450 700 0
chr4 100 300 0
chr7 350 600 0
chr9 100 125 0
chr11 679 687 0
chr22 100 200 0
chr22 300 400 0

testReg.txt is simply a test file I use to test the code. It includes various chromosome values along with 3 other columns of data.



My code so far:

Code
#!/usr/bin/perl 
use warnings; use strict;

my (%Chr, %R);
my (@key_split, @reg_split);
my ($reg_line);

open(KEY, "<hashKey.txt") or die "error reading key list";
open(REG, "<testReg.txt") or die "error reading file";

while (<KEY>) {

chomp;
@key_split = split("\n");
$Chr{"$key_split[0]"} = $key_split[0];
}

while (<REG>) {

chomp;
@reg_split = split("\t");
#$R{"$reg_split[0]"} = ($reg_split[0], $reg_split[1], $reg_split[2
+], $reg_split[3]);
$R{"$reg_split[0]"} = $reg_split[0];
}


foreach my $key (keys %Chr) {
if(exists($R{$key})){
print ("$R{$key}\n");
}
}
close(KEY);
close(REG);

So far, my code prints out all of the chr values in common between hashKey.txt and testReg.txt. What I would like it to do is to print each line to a separate file designated by each chromosome. For example:

chr1.out

Code
chr1    100    159    0 
chr1 200 260 0
chr1 500 750 0


chr3.out

Code
chr3    450    700    0


chr4.out

Code
chr4    100    300    0


chr7.out

Code
chr7    350    600    0


chr9.out

Code
chr9    100    125    0


chr11.out

Code
chr11    679    687    0


chr22.out

Code
chr22    100    200    0 
chr22 300 400 0



From there I can use each separated file to sort what I need to. I suppose my main problem is trying to figure out how to have the hash variable point toward the unique line. Is what I am trying to accomplish even possible with hash table given that the key could be used for multiple lines? My main goal is to just separate each chr from the input file (testReg.txt) into separate files. If you have any suggestions please let me know.


shawnhcorey
Enthusiast


Jun 29, 2011, 6:45 AM

Post #2 of 4 (827 views)
Re: [a217] Using hash keys to separate data [In reply to] Can't Post

Try:

Code
#!/usr/bin/env perl 

use strict;
use warnings;

my $key_file = 'hashKey.txt';
my $reg_file = 'testReg.txt';

open my $key_fh, '<', $key_file or die "could not open $key_file: $!\n";
chomp( my @valid_keys = <$key_fh> );
close $key_fh or die "could not close $key_file: $!\n";
my %valid_keys = map { $_, 1 } @valid_keys;

my %reg = ();
open my $reg_fh, '<', $reg_file or die "could not open $reg_file: $!\n";
while( <$reg_fh> ){
my @reg = split;
if( exists $valid_keys{$reg[0]} ){
push @{ $reg{$reg[0]} }, $_;
}else{
warn "invalid key for $_";
}
}
close $reg_fh or die "could not close $reg_file: $!\n";

for my $key ( keys %reg ){
my $out_file = "$key.out";
open my $out_fh, '>', $out_file or die "could not open $out_file: $!\n";
print $out_fh @{ $reg{$key} } or die "could not print to $out_file: $!\n";
close $out_fh or die "could not close $out_file: $!\n";
}


__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


FishMonger
Veteran / Moderator

Jun 29, 2011, 7:16 AM

Post #3 of 4 (825 views)
Re: [shawnhcorey] Using hash keys to separate data [In reply to] Can't Post

Shawn,

You beet me to the punch. I was going to suggest almost the exact same solution.

However, why the unnecessary use of the @valid_keys array?
Why not this:

Code
my %valid_keys = map { chomp; $_ => 1 } <$key_fh>;


It's understandable on write filehandles, but Isn't 'die' on the closing of read only filehandles a little over kill?


shawnhcorey
Enthusiast


Jun 29, 2011, 7:23 AM

Post #4 of 4 (823 views)
Re: [FishMonger] Using hash keys to separate data [In reply to] Can't Post


In Reply To
It's understandable on write filehandles, but Isn't 'die' on the closing of read only filehandles a little over kill?


No, if any error occurred on a read, the die will catch it but not report the correct error. I suppose I should do this:

Code
my %reg = (); 
open my $reg_fh, '<', $reg_file or die "could not open $reg_file: $!\n";
while( <$reg_fh> ){
my @reg = split;
if( exists $valid_keys{$reg[0]} ){
push @{ $reg{$reg[0]} }, $_;
}else{
warn "invalid key for $_";
}
}
die "could not read from $reg_file, line $.: $!\n" if $!;
close $reg_fh;


__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives