CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
comparing two arrays which contain strings

 



msds
User

Aug 6, 2002, 12:27 AM

Post #1 of 5 (654 views)
comparing two arrays which contain strings Can't Post

Smile Hi! I'm new to perl, and I'm going nuts with this script! someone please help

This is what I'm trying to do.(I've pasted the code I tried below):1.Read a text file,split the text into tokens,using whitespace as delimiter,store tokens in an array

2.Do the same: split into tokens, for another file(this has a single,long, column of words) , and store tokens in a hash.

3.Compare each token from the array, with each token in the hash, and display those words which did not find a match in the hash.



#Read text from flat file database, and
#split into tokens,store in a hash


open(FH, "<spip.txt");
$lastt=0;
while($read2=<FH>)
{
@wordlist=split " ", $read2;

foreach $word(@wordlist){
chomp($word);
%lexicon=($last=>$word);

#The variable $last is auto. initialised to 0,and is used #for key values

#Print contents of lexicon

#print $lexicon{$last};
#print "\n";

$lastt++;

}
}

#print $lastt;




#Match each token(whole word) from input file with all tokens in database



print "The following words were not found in the lexicon:";
print"\n";

open(ER, "<spip.txt");

$lasttoken=0;

while($read=<ER>)

{

@tokens=split " ", $read;


foreach $token(@tokens){
chomp($token);
$lasttoken++;

}



for(my $j=0;$j<=$lasttoken;$j++)
{
#for(my $i=0;$i<=$lastt;$i++)
#{
if ($tokens[$j] ne /$lexicon{$i}/)
{
print $tokens[$j];
print "\n";

}

#}
}
#print $j;
}

will storing tokens from the input file in a hash instead of
an array speed up the search?(then i need to compare values in two hashes).
Then,.. after this works, i need to do a binary search on the flat file database(values in the hash)

e.g. The input text file can contain:

blue
house

The database file can have:
blue
green
red
yellow
......
.....

So the program should return 'house' as not having found a match in the database('blue' exists in the database)

hope you can suggest some help.my code is returning
both 'house' and 'blue' as not having found a match, when it should be returning only 'house'

Thanks,
msds


davorg
Thaumaturge / Moderator

Aug 6, 2002, 1:05 AM

Post #2 of 5 (652 views)
Re: [msds] comparing two arrays which contain strings [In reply to] Can't Post

Looks to me as tho you're making things far too difficult for yourself.


Code
open LEX, 'lexicon.txt' or die $!;  

my %lexicon;
while (<LEX>) {
chomp;
my @words = split;
@lexicon{@words} = (1) x @words;
}

close LEX;

open TEXT, 'text.txt' or die $!;

my @missing;

while (<TEXT>) {
push @missing, grep { ! $lexicon{$_} } split;
}

if (@missing) {
print "The following words were missing:\n",
map { "$_\n" } @missing;
} else {
print "No words were missing\n";
}


--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


msds
User

Aug 7, 2002, 12:38 AM

Post #3 of 5 (644 views)
Re: [davorg] comparing two arrays which contain strings [In reply to] Can't Post

SmileThanx a lot for your code.It was of real help.Way to go!
What if we were to extract the first character of each word in "text.txt", and compare with only the those words in "lexicon.txt" , which begin with the same character? And if we were to find the length of each word in "text.txt",and compare with words of that length only in "lexicon.txt"

Also, I'm trying to do a binary search on the values in a hash, in my usual messy way .Here is the code I tried:

%food_color = (
1 => "apple",
2 =>" blue",
3 => "green",
4 => "mauve",
5 => "nose",
6 =>"orange",
7 =>"purple",
8 =>"red",
9 =>"violet",
10 =>"yellow"

);


#while(($k,$v) = each %food_color)
# {
# print"$k=>$v\n";
# }


open(ER,"<haship.txt");

$lasttoken=0;
print "The following words did not find a match in the lexicon:";
print"\n";
print"\n";
while($read=<ER>)

{

#push @tokens,$read;
@tokens=split " ", $read;

}


foreach $token(@tokens){
chomp($token);

$lasttoken++;
$first_word=1;
$last_word=keys(%food_color);


# search
$found = 0;
#while ( $first_word < $last_word && ! $found) {
while($first_word < $last_word)
{
$mid = &calcmid($first_word,$last_word);
#print ">$first_word, $last_word, $mid\n";

$result = $token cmp $food_color{$mid};

if($result == 0) {
print"The word found a match exactly in the middle of the lexicon:";
print"\n";
print $token;
print"\n";
$found = 1;
}
elsif($result == -1)
{
#print "Match is on LHS\n";
$last_word=$mid;

}
elsif($result == 1)
{
$first_word=$mid;
#print "Match is on RHS\n";
}
} #}

}



sub calcmid
{


$sum= $midt=$sum/2;

my $mid=int($midt);

return $mid;
}

Here, the value of $mid is not getting updated as it should.Please help!

Thanx,
msds


davorg
Thaumaturge / Moderator

Aug 7, 2002, 1:32 AM

Post #4 of 5 (641 views)
Re: [msds] comparing two arrays which contain strings [In reply to] Can't Post


In Reply To
What if we were to extract the first character of each word in "text.txt", and compare with only the those words in "lexicon.txt" , which begin with the same character? And if we were to find the length of each word in "text.txt",and compare with words of that length only in "lexicon.txt"

That sounds a lot like premature optimisation to me. Is the code not running fast enough for you?


In Reply To
Also, I'm trying to do a binary search on the values in a hash, in my usual messy way.


It's very unusual to want to do a binary search on a hash. Hashes allow you to do direct look-ups on their key. That's far faster than a binary search.


In Reply To
Here is the code I tried:

%food_color = (
1 => "apple",
2 =>" blue",
3 => "green",
4 => "mauve",
5 => "nose",
6 =>"orange",
7 =>"purple",
8 =>"red",
9 =>"violet",
10 =>"yellow"
);


That's very strange. You're saving that data in a hash - but you're treating it as an array (monotonically increasing integer keys). If you want the properties of an array then use an array - they're faster than hashes.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


msds
User

Aug 7, 2002, 9:36 PM

Post #5 of 5 (633 views)
Re: [davorg] comparing two arrays which contain strings [In reply to] Can't Post

Smile Hello Dave,
thanx for your post.First of all ,there is no particular reason why I posted my question in two forums.I just thought it fitted in both categories, and since I got replies in both I continued the discussion.However I've more or less got the hang of the thing, and will prefer to continue on improvements in this forum.My apologies if I've messed up things.

Regarding optimising the program, I'm thinking of that, because at some point the "lexicon.txt" file may contain a huge number of words like 90,000 words or even more. Right now it has about 5000 words and is working fine.
Come to that, so will the input file "text.txt".

So, (I think) I need to optimise by partitioning the text in
"lexicon.txt" on word length, etc...?

Looks like I have a LOT to learn about hashes.
Thanx a lot,
msds

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives