
mariorizzuti
newbie
Jul 25, 2001, 3:47 PM
Post #1 of 1
(691 views)
|
|
Finding a word in a sorted list.
|
Can't Post
|
|
I have a fixed-length-records file where: * each record is 18 bytes made by an username (12 bytes) and a number (6 bytes). * the records are sorted alphabetically. I am wondering what is the fastest possible way to find the associated numbers of a given list of usernames? Thanks for any suggestion. This is what I have so far: -------- my @user = qw/mario john bill/; my @sorted_user = sort @user; # sorting allows us to use the last found word as a "$min" for the next one. my %Line; my @stats = stat "user.ind"; my $min = 0; my $max = $stats[7] / 18; # how many records open INDEX, "user.ind" or die "Can't open 'user.ind': $!"; USER: foreach my $to_find ( @sorted_user ) { my $last_guess; while (1) { my $guess = int ( ($min + $max)/ 2 ); # we are sure that it's between $min and $max, let's try in the middle. seek (INDEX, 18 * $guess, 0); # this should be more efficient (?) if we used the last read position as offset. my $bytes = read INDEX, my $record, 18; croak "Username $to_find couldn't be found." if $bytes != 18 || $guess == $last_guess; my ($username, $pointer) = unpack "A12 A6", $record; if ($username eq $to_find ) { $Line{$username} = $pointer; $min = $guess; # since the user list is sorted, the next word will be after the one we have just # found. next USER; } elsif ($username lt $to_find ) { $min = $guess; # our word is not before the one we have just read. } else { $max = $guess; } $last_guess = $guess; } } close INDEX or die "Can't close 'user.ind': $!"; @user = map {$Line{$_}} @user; # return this -------- Mario Rizzuti
|