CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Join two tables via harsh tables (test script and input files are attached)

 



Kate
New User

Jan 26, 2014, 9:21 PM

Post #1 of 2 (1213 views)
Join two tables via harsh tables (test script and input files are attached) Can't Post

Dear all,

I found a script of joining two simple table files. The script works well, yet I can't understand few details. Any feedback or brief explanation will be appreciated. Thanks!!

Kate

### input_dataset1.txt ###

contig11 GO:100 other columns of data
contig11 GO:289 other columns of data
contig11 GO:113 other columns of data
contig22 GO:388 other columns of data
contig22 GO:101 other columns of data

### input_dataset2.txt ###

contig11 3 N
contig11 1 Y
contig33 1 Y
contig22 1 Y
contig22 2 N

### output ###

contig11 3 N GO:100 other columns of data
contig11 3 N GO:289 other columns of data
contig11 3 N GO:113 other columns of data
contig11 1 Y GO:100 other columns of data
contig11 1 Y GO:289 other columns of data
contig11 1 Y GO:113 other columns of data
contig22 1 Y GO:388 other columns of data
contig22 1 Y GO:101 other columns of data
contig22 2 N GO:388 other columns of data
contig22 2 N GO:101 other columns of data

### script.pl ###

Code
open(my $GOTERMS, $ARGV[0]) 
or die("Error opening GO terms file \"$ARGV[0]\": $!\n");
open(my $SNPS, $ARGV[1])
or die("Error opening SNP file \"$ARGV[1]\": $!\n");

my %goterm;
while (<$GOTERMS>) {
my ($id, $rest) = /^(\S++)(.*)/s; # -----> Question 1
push @{ $goterm{$id} }, $rest; # ------> Question 2
}

while (my $row2 = <$SNPS>) {
chomp($row2);
my ($id) = $row2 =~ /^(\S+)/; # ------> Question 3
foreach my $rest (@{ $goterm{$id} }) {
print("$row2$rest");
}
}

### My questions ###

Question 1.
It saves the first field (key) to $id and other fields to $rest.
a: why = not =~
b: why ^(\S++) not ^(\S+) ?
c: why /s ?

Question 2.
I just declared the hash in the first line, so it should be an empty harsh %goterm. Since when $goterm{$id} became an array? Is "push @{ $goterm{$id} }, $rest"equal to "$goterm{$id}=$rest"?

Question 3.
a: why ($id) not $id
b: why =~ not =
c: why ^(\S+) not ^(\S++)


(This post was edited by FishMonger on Jan 27, 2014, 9:04 AM)
Attachments: input_dataset1.txt (0.19 KB)
  input_dataset2.txt (70 B)
  script.pl (0.52 KB)


BillKSmith
Veteran

Jan 27, 2014, 4:44 AM

Post #2 of 2 (1196 views)
Re: [Kate] Join two tables via harsh tables (test script and input files are attached) [In reply to] Can't Post

1.a.
The fields $1 and $2 are assigned with the '=' sign. A match against $_ is implied so '=~' is not needed.
This line could be coded:

Code
$_ =~ /^(\S++)(.*)/s;  
my $id = $1;
my $rest=$2

;

1.b.
Refer to the section on "possessive" quantifiers in perldoc perlre. In this case, the possessive serves only to speed up the match.

1.c.
The /s modifier makes a period match a newline as well as every other character. It is not needed here. but does no harm. Many people always use it lest they forget it when it is needed.

2.
No. Each value of the hash %goterm is a reference to an anonymous array. Your push adds the contents of $rest to the array referred to by $goterm{$id}. If the array does not exist, it is created.

3.a.
The parens are needed to force the match into list context. Without them, $id would contain 1 for a match and 0 otherwise.

3.b.
This statement uses both. The '=~' specifies that the match is against $row2. The assignment operator '=' assigns $1 to $id.

3.c.
The possessive would not help because there are no more fields in the pattern.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives