
zing
Novice
Oct 7, 2012, 1:19 PM
Post #1 of 3
(2812 views)
|
supertree construction in perl
|
Can't Post
|
|
Hi all , I had this problem of creating a supertree by recursive Alfred Aho's algorithm. So I divided the problem into its the key concepts and dealt with them one by one. I'll first put up my code I built with help of people on this forum and then later on I'll explain my problem.So here's what I've build till now
#This program read the triplets from file named "data" and returns the #supertree. # ___DATA(triplets)____ #b c a #a c d #d e b #### NOTE ::: SuperTree part hasnt been incorporated yet. use strict; use warnings; use Data::Dumper; use Graph; use Data::Dump qw/ pp /; ####READ IN THE INPUT DATA ######## my @triplets; # Get all the triplets while (<>) { push @triplets, [ split ]; } #Make a deep copy of @triplets my @triplet_deep_copy = map { [@$_] } @triplets; #####AUXILIARY GRAPH G(L) ####### # In order to generate the G(L) first of all extract first two columns of @triplets #into another matrix my @auxiliary_edges=@triplets; splice(@$_, 2, 1) foreach @auxiliary_edges; print "----EDGE LIST TO BUILD AUXILIARY GRAPH-----\n"; print Dumper \@auxiliary_edges; ##### CONNECTED COMPONENTS ########## my $auxiliary_graph = Graph->new( undirected => 1 ); my @from; my @to; for (my $p = 0; $p <= 2; $p++) { $from[$p]=$triplets[$p][0]; } for (my $q = 0; $q <= 2; $q++) { $to[$q]=$triplets[$q][1]; } for (my $r = 0; $r <= 2; $r++) { $auxiliary_graph->add_edge($from[$r], $to[$r]); } my @subgraphs = $auxiliary_graph->connected_components; # Subgraphs my $V = $auxiliary_graph->vertices; # Number of taxa my $connected_components=scalar @subgraphs; #Get the number of #connected components ###### FINDING THE TRIPLETS WHICH ARE SUBSET(OR INDUCED BY) OF #EACH OF THE CONNECTED COMPONENTS###### Main(@auxiliary_edges); exit(0); sub induced { my $trip = shift; my @matches; for my $QT ( @_ ) { for my $triplet ( @$trip ) { my %seen; # my %Pie; undef @seen{@$QT}; delete @seen{@$triplet}; if ( keys( %seen ) <= ( @$QT - @$triplet ) ) { push @matches, $triplet; } } ## end for my $triplet ( @$trip ) } ## end for my $QT ( @_ ) return @matches; }## end sub induced sub Main { my $tree = Graph->new( undirected => 1 ); my $dad='u'; $tree->add_vertex($dad); for my $one (@subgraphs) { my @matches = induced( \@triplet_deep_copy, $one ); print "\nTriplet induced by subgraph ", pp( $one => { MATCHES =>\@matches } ), "\n\n"; } } So this is what I have written till now. Now let me explain my problem.
___INPUT(set of triplets)____ b c a a c d d e b Set of species/vertices=a,b,c,d,e Now build the auxiliary graph by selecting first two vertices of each of the triplets,i.e. The auxiliary graph thus generated will be The number of connected components in this auxiliary graph (q)=2 (viz. a-c-b and d-e) The algorithm I need to implement is this:-
TreeConstruct(S) 1. Let L be the set of species appear in S. Build G(L) 2. Let C1 , C2 , . . . , Cq be the set of connected components in G(L) 3. If q >1, then • For i = 1, 2, . . . , q, let Si be the set of triplets in S labeled by the set of leaves in Ci . • Let Ti = TreeConstruct(Si ) • Let T be a tree formed by connecting all Ti with the same parent node. Return T. 4. If q=1 and C1 contains exactly one leaf, return the leaf; else return fail. The progression will be like this:-
1. Initially we have q=2 (a-c-b & d-e). So introduce an internal vertex (u) and make these connected components child of u. u=> a-c-b; d-e; 2. Select component 1 = a-c-b. Check all lines from INPUT which are a subset of this component1.First line of INPUT i.e. "b c a" is a subset of component1. 3."b c a" now becomes the INPUT for the program and it is recursed again with this INPUT(Now for input "b c a" the auxiliary graph will be "b-c" & "a",i.e. two connected components,thus q=2 ...) Final output(SUPERTREE) for the given input should be like this u => u => d => e => u => a => u => b => c TRIPLETS(input) and SUPERTREE(output) look like these http://ars.sciencedirect.com/content/image/1-s2.0-S0166218X10000983-gr1.jpg The picture link above has the exact triplets for my problem and the exact supertree(output) expected
(This post was edited by zing on Oct 8, 2012, 10:16 AM)
|