CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
find shortest path for each query from a CSV file

 



zing
Novice

Nov 20, 2013, 11:58 PM

Post #1 of 7 (2153 views)
find shortest path for each query from a CSV file Can't Post

Hi all, I have this file


Code
__DATA__ child, Parent, probability,  
M7, Q, P,
M7, M28, E,
M28, M6, E,
M6, Q, Pl,
& several hundred lines.....



Legends: P(= Probable) > Pl(=Plausible) > E(=Equivocal). What I want is for each child I want to trace it back to Q(Query), but I need the shortest path which leads it to the Q(Query) along with their probabilities. For example for the input data shown above the output should be :-


Code
__OUTPUT_1__  
M7: M7<-Q = P
M28: M28<-M6<-Q = Pl.E
M6: M6<-Q = Pl


But as we can see from second row of input data M7 has another longer path tracing to Q : M7<-M28<-M6<-Q = Pl.E.E. But the code should have an option to neglect the largest path and thus show only the shortest path OR to show all of them. i.e.


Code
__OUTPUT_2__  
M7: M7<-Q = P
M7: M7<-M28<-M6<-Q = Pl.E.E
M28: M28<-M6<-Q = Pl.E
M6: M6<-Q = Pl


Thus this second output prints a path tracing back to Q for each of the rows, so if we have N input rows to the program , we will have N corresponding output rows.


(This post was edited by zing on Nov 21, 2013, 9:24 PM)


Laurent_R
Veteran / Moderator

Nov 21, 2013, 3:53 PM

Post #2 of 7 (2144 views)
Re: [zing] Conversion of tree to graph to resolve shortcomings [In reply to] Can't Post

Please explain what you are trying to do before giving some code and some presumably faulty result.


zing
Novice

Nov 22, 2013, 3:45 AM

Post #3 of 7 (2104 views)
Re: [Laurent_R] Conversion of tree to graph to resolve shortcomings [In reply to] Can't Post

This is the code that I tried

Code
use warnings; 
no warnings 'recursion';
use strict;

<DATA>; # Skip the headers (first row).

my %edges;
my %kids;
while (<DATA>) {
chomp;
my ($child, $parent, $likelihood) = split /\t/;
$edges{$parent}{$child} = $likelihood;
$kids{$child} = 1;
}
my $nkids = scalar keys %kids;

my %pathes = ( Q => { path => [ 'Q' ], lh => [] } );
sub shortest_pathes {
my( $end, $level ) = @_;
return if $level > $nkids;
return unless $edges{$end};
my @lh = @{$pathes{$end}{lh}};
my @path = @{$pathes{$end}{path}};
my @pre = keys %{$edges{$end}};
for( @pre ) {
if( ( !exists $pathes{$_} ) or ( scalar @{$pathes{$_}{
+path}} > 1 + scalar @path ) ) {
$pathes{$_} = { path => [ $_, @path ], lh => [
+ $edges{$end}{$_}, @lh ] };
}
}
shortest_pathes( $_, $level+1 ) for @pre;
}

shortest_pathes 'Q', 0;
delete $pathes{Q};

for (sort {substr($a,1) <=> substr($b,1)} keys %pathes) {
print "$_\t";
print join "<-", @{$pathes{$_}{path}};
print "\t";
print join ".", @{$pathes{$_}{lh}};
print "\n";
}

__DATA__
child, Parent, likelihood
M7 Q P
M54 M7 Pl
M213 M54 E
M206 M54 E
M194 M54 E
M53 M7 Pl
M186 M53 Pl
M194 M53 Pl
M187 M53 E
M204 M53 E
M201 M53 E
M202 M53 E
M179 M53 E
M13 M53 E
M157 M53 E
M173 M53 E
M205 M53 E
M195 M53 E
M196 M53 E
M197 M53 E
M198 M53 E
M57 M7 E
M44 M7 E
M61 M7 E
M13 M7 E
M50 M7 E
M158 M50 P
M157 M50 P
M153 M50 Pl
M162 M50 E
M164 M50 E
M165 M50 E
M147 M50 E
M159 M50 E
M160 M50 E
M161 M50 E
M51 M7 E
M174 M51 Pl
M50 M51 Pl
M158 M50 P
M157 M50 P
M153 M50 Pl
M162 M50 E
M164 M50 E
M165 M50 E
M147 M50 E
M159 M50 E
M160 M50 E
M161 M50 E
M173 M51 Pl
M175 M51 E
M176 M51 E
M62 M7 E
M55 M7 E
M17 Q P
M83 M17 Pl
M54 M17 Pl
M213 M54 E
M206 M54 E
M194 M54 E
M84 M17 E
M101 M17 E
M98 M17 E
M99 M17 E
M76 M17 E
M206 M76 P
M278 M76 E
M289 M76 E



Laurent_R
Veteran / Moderator

Nov 22, 2013, 4:25 PM

Post #4 of 7 (2077 views)
Re: [zing] Conversion of tree to graph to resolve shortcomings [In reply to] Can't Post

Hi,

this reminds me of another problem I solved a few months ago. This was a game with stones on a 4x4 board. Stones had a white and a black side. So you could have 2^16 = 65536 initial positions. You had 46 authorized moves, each turning around 3 to 4 stones (turning around a full line, a full column, a diagonal or a L pattern). The aim is to have all the stones on their white side. Any game can be solved in 5 moves, so that each initial solution can lead to 46 ^5 games (206 millions), but, given the number of initial positions, that leads to a huge amount of possible games, about 1.35 x 10^13 (13 thousand billion) possibilities. A quick test showed that checking all the possible paths would require more than 225 days on my computer. But at the same time, any move starts from one of 65k position and leads to 65k positions. So my idea was to start from the winning position (step 0), find the 46 positions that can lead to that (step 1) and record them in a hash as winning in one move, examine all the positions leading to one of those 46 positions (step 2) and removing from the problem any position already seen (already in the hash), and so on. Overall, I had to check only 2.34 million moves and that took me about 1.5 seconds. That took 11 lines of code.

Your problem may look very different, but I am pretty much convinced that the very same approach can lead to an easy solution. If you think about the game mentioned above, my problem was really to find the shortest path between the winning position and any original position. And that is very very similar to your problem. My approach made it possible to eliminate very early any solution that could be determined to be sub-optimal, and I am quite sure the same approach would work in your case.

So I would load your data in an appropriate date structure, start from Q, look for all precedessors, mark up those already found by storing them in a hash, look for the predecessors of those predecessors, unless they have been already been visited, etc. until all lines have been marked as solved.

It is quite late here now, but I am willing to give you the code tomorrow (hopefully, I will not forget) if you confirm that this is really what you are looking for.


zing
Novice

Nov 24, 2013, 11:41 PM

Post #5 of 7 (2016 views)
Re: [Laurent_R] Conversion of tree to graph to resolve shortcomings [In reply to] Can't Post

Thanks Laurent . Yes you are right.
To give you an intuition consider this as input data

Code
__DATA__ 
M7, Q, P,
M7, M28, E,
M28, M6, E,
M6, Q, Pl,


The output expected is

Code
child     shortest_path      Probability   Group_ID 
M7 : M7<-Q, P I_1
M6 : M6<-Q, Pl I_2
M28 : M28<-M6<-Q, Pl.E II_17
M7 : M7<-M28<-M6<-Q, Pl.E.E IV_34



Laurent_R
Veteran / Moderator

Nov 26, 2013, 2:26 PM

Post #6 of 7 (1906 views)
Re: [zing] Conversion of tree to graph to resolve shortcomings [In reply to] Can't Post

Hi,

sorry, I see your message only now.

It is again quite late and I am leaving out for an early flight tomorrow morning, I can't do anything this evening. Actually, when I offered to provide some code, this was the weekend, I could manage some free time, it is quite a bit more complicated during the week. I might be able to try to do something during my flight, but I am not sure at all that I'll have the energy.

Just one question on your example:


Quote
M7 : M7<-Q, P I_1
...
M7 : M7<-M28<-M6<-Q, Pl.E.E IV_34


I would have thought that since you have a direct path from Q to M7 (first line), you would not want to keep the longer path of the last line above. This is apparently incorrect. Does that mean that you want to keep all possible paths?

In that case, my idea of pruning "useless" paths early might not be what you need.

One last question. I think that you asked the same question on Perl Monks. If you got the appropriate answer there, then it would be nice to inform people here to avoid double work. I am happy to try to help others, but it still requires some work (not just a quick answer), so I would not want to do it if it is in fact useless.


zing
Novice

Nov 26, 2013, 10:01 PM

Post #7 of 7 (1893 views)
Re: [Laurent_R] Conversion of tree to graph to resolve shortcomings [In reply to] Can't Post

Laurent actually I want two outputs,

In one of them I want only the shortest path, while in the second one I want all possible paths (just to have a record).

Also I forgot to add initially, but this is the hash table for all the goup id's (Which I want to be printed in fourth column.

Code
my %DEF = ( 
I => [qw( P Pl P.P P.Pl Pl.P Pl.Pl P.P.P P.P.Pl P.Pl.P P.Pl.Pl Pl.P.P Pl.P.Pl Pl.Pl.P Pl.Pl.Pl )],
II => [qw( E P.E Pl.E P.P.E P.Pl.E Pl.P.E Pl.Pl.E )],
III => [qw( E.P E.Pl P.E.P P.E.Pl Pl.E.P Pl.E.Pl E.P.P E.P.Pl E.Pl.P E.Pl.Pl )],
IV => [qw( E.E P.E.E Pl.E.E E.P.E E.Pl.E E.E.P E.E.Pl E.E.E )]
); # Hash table for all the groups


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives