Jan 18, 2001, 3:48 PM
Post #3 of 4
Here is the proof that:
Re: How To Retrieve a Random Line From A File?
[In reply to]
provides a fair and equal distibution for any number of lines.
rand($.) < 1 and ($line = $_) while <FH>;
Proof by Induction
The rand($bound) function returns some real number x such that 0 <= x < $bound. The $. variable contains the current line number of the filehandle being read from.
Thus, for the first line of the file, we have rand(1) < 1, which is true 100% of the time. Thus, after the first line of the file has been read, $line is the first line of the file.
When we get to the second line, rand(2) < 1 1/2 of the time. This means that the second line has a 50% chance of being placed in $line, and the first line has a 100% chance of the remaining 50% (100% of 50% = 50%) to be chosen.
When we get to the third line, rand(3) < 1 1/3 of the time. This means that the third line has a 33% chance of being in $line, and that the second line has a 50% chance of the remaining 66% (50% of 66% = 33%) to be chosen, and the first line has a 100% chance of the remaining 33% (100% of 33% = 33%).
So, we can assume that at line N, each line has a 1/N probability of being chosen.
Now, line N + 1 has a 1 / (N + 1) probability of being chosen. That leaves N / (N + 1) probability for the remaining N lines, which is equally divisible by N, and that leaves a 1 / (N + 1) probability for each of the other N lines.
This proof can also be found in chapter 8 of the Perl Cookbook.
Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author