japhy
Enthusiast
Jan 18, 2001, 3:48 PM
Post #3 of 4
(6699 views)

Re: How To Retrieve a Random Line From A File?
[In reply to]

Can't Post


Here is the proof that:
rand($.) < 1 and ($line = $_) while <FH>; provides a fair and equal distibution for any number of lines. Proof by Induction The rand($bound) function returns some real number x such that 0 <= x < $bound. The $. variable contains the current line number of the filehandle being read from. Thus, for the first line of the file, we have rand(1) < 1, which is true 100% of the time. Thus, after the first line of the file has been read, $line is the first line of the file. When we get to the second line, rand(2) < 1 1/2 of the time. This means that the second line has a 50% chance of being placed in $line, and the first line has a 100% chance of the remaining 50% (100% of 50% = 50%) to be chosen. When we get to the third line, rand(3) < 1 1/3 of the time. This means that the third line has a 33% chance of being in $line, and that the second line has a 50% chance of the remaining 66% (50% of 66% = 33%) to be chosen, and the first line has a 100% chance of the remaining 33% (100% of 33% = 33%). So, we can assume that at line N, each line has a 1/N probability of being chosen. Now, line N + 1 has a 1 / (N + 1) probability of being chosen. That leaves N / (N + 1) probability for the remaining N lines, which is equally divisible by N, and that leaves a 1 / (N + 1) probability for each of the other N lines. This proof can also be found in chapter 8 of the Perl Cookbook. Jeff "japhy" Pinyan  accomplished hacker, teacher, lecturer, and author
