CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
suggestions for spellchecker

 



Jurafsky
Novice

Nov 18, 2012, 3:03 AM

Post #1 of 10 (2841 views)
suggestions for spellchecker Can't Post

I've created a little spellchecker.
This script works in this way:

After reading each line of the text, realizes some corrections
thanks to a comparison between a dictionary and the text itself.

When it finds a word that doesn't exist in the dictionary, it corrects the words
(giving one or more suggestions) and pushes it into an array.

Here there's my problem:

I would like to give to the user the possibility to choose the correct word
among the words suggested. Something like this:

We found the word "wlak" in your text which isn't correct.
The suggested possibilities are:
1. walk
2. work

type the number associated to the word or 0 if you can't find the correct word.

Then I would like to replace the correct word on the original text (creating a new .txt).

How can I do this?



Code
use diagnostics; 
use warnings;

my ($file_dictionary, $word, $line, $line1, $alph, $elt, $w, $transposition, $letter1, $letter2);
my (@word, @altered_word, @filedictionary, @filetext, @dictionary, @addition, @replacement, @transposition, @removal);


$file_dictionary = "lexique.txt";
$file_text = "texte.txt";

#I create an array for the dictionary
open (L, "<", $file_dictionary);
while (defined( $line1 = <L>)) {
chomp($line1);
@filedictionary = split (/\s/, $line1);
push (@dictionary, @filedictionary);
}

#I create an array for the text
open (T, "<", $file_text);
while (defined( $line = <T>)) {
chomp($line);
@filetext = split (/(\s|\pP)/, $line);
for ($i = 0; $i < @filetext; $i++) {
if (!grep(/^$filetext[$i]$/, @dictionary)) {
push (@word, $filetext[$i]);
}
}
}

#then I create an array for each word
foreach $w(@word) {
@altered_word = split (//, $w);

#I create an array for the dictionary
open (L, "<", $file_dictionary);
while (defined( $line1 = <L>)) {
chomp($line1);
@filedictionary = split (/\s/, $line1);
push (@dictionary, @filedictionary);
}

#first operation --> "palrer" will be "parler"
for (my $i=0; $i < $#altered_word ; $i++)
{
@transposition = @altered_word;
$letter1 = $transposition[$i];
$letter2 = $transposition[$i+1];
$transposition[$i] = $letter2;
$transposition[$i+1] = $letter1;

$transposition = join "", @transposition;
if (grep(/^$transposition$/, @dictionary))
{
print "post transposition : $transposition\n";
}

}

foreach $elt (0 .. $#altered_word) {
#second operation --> parller will be parler

@removal = @altered_word;
splice(@removal, $elt, 1);
$removal = join "", @removal;
if (grep(/^$removal$/, @dictionary))
{
print "post enlevement : $removal\n";
}

#third operation --> parer will be parler

foreach $alph('a' .. 'z') {

@addition = @altered_word;
splice(@addition, $elt, 0, $alph);

$addition = join "", @addition;
if (grep(/^$addition$/, @dictionary)) {
print "post addition : $addition\n";
}

#last operation : mancer will be manger

@replacement = @altered_word;
splice(@replacement, $elt, 1, $alph);
$replacement = join "", @replacement;
if (grep(/^$replacement$/, @dictionary)) {
print "post replacement : $replacement\n";
}
}
}
}


https://www.dropbox.com/s/t9fc2dk5mqbsb20/texte.txt this is the text

https://www.dropbox.com/s/717rczou0mkrp0s/lexique.txt This is the French Dictionary


Laurent_R
Veteran / Moderator

Nov 18, 2012, 4:07 AM

Post #2 of 10 (2838 views)
Re: [Jurafsky] suggestions for spellchecker [In reply to] Can't Post

See my answer in your cross post on the dev shed forum.


Jurafsky
Novice

Nov 29, 2012, 4:41 AM

Post #3 of 10 (2310 views)
Re: [Laurent_R] suggestions for spellchecker [In reply to] Can't Post

here we are. Now I've just one problem : How can I replace the new word in $line2 and then copy it in T2 - new_file ?


Code
use diagnostics; 
use warnings;

my ($word, $file_dict, $txt, $line, $line2, $i);
my ($first_letter, $second_letter, $letter, $alphabet);
my ($user, $exchange, $transposition, $removal, $addition);
my (@text, @words, @dictionary, @dict, @single_letters);
my (@transposition, @removal, @addiction, @exchange, @correct);

$file_dict = "dict.txt";
$txt = "txt.txt";
$new_txt = "output.txt";

#open the dictionary and save it in an array
open (D, "<", $file_dict);
while(defined($line = <D>)) {
chomp($line);
@dict = split(/\s/, $line);
push (@dictionary, @dict);
}
close (D);

#open the file for output
open (T2, ">", $new_txt);

#open and save the text
open (T, "+<", $txt);
while(defined($line2 = <T>)) {
chomp($line2);
@text = split (/ /, $line2);
push (@words, @text);


#foreach word of the text, I reset the array of correct words
#then I verify if the word is in the dictionary
#if it isn't there, I split the word in letters
#then I apply the correction that will be saved in @correct array.

foreach $word(@words){
@correct = "exit";

if (!grep(/^$word$/, @dictionary)) {

print "Word : '$word' isn't in the dictionary.\n";


@single_letters = split (//, $word);

#transposition
for (my $i = 0; $i < $#single_letters; $i++) {
@transposition = @single_letters;
$first_letter = $transposition[$i];
$second_letter = $transposition[$i+1];
$transposition[$i] = $second_letter;
$transposition[$i+1] = $first_letter;

$transposition = join "", @transposition;

if (grep(/^$transposition$/, @dictionary)) {
push (@correct, $transposition);
}
}

#removal

foreach $lettera ( 0 .. $#single_letters) {
@removal = @single_letters;
splice (@removal, $lettera, 1);
$removal = join "", @removal;
if (grep(/^$removal$/, @dictionary)) {
push (@correct, $removal);
}

#addition


foreach $alphabet ( 'a' .. 'z') {
@addition = @single_letters;
splice (@addition, $lettera, 0, $alphabet);
$addition = join "", @addition;

if (grep(/^$addition$/, @dictionary)) {
push (@correct, $addition);
}

#exchange

@exchange = @single_letters;
splice(@exchange, $lettera, 1, $alphabet);
$exchange = join "", @exchange;

if (grep(/^$exchange$/, @dictionary)) {
push (@correct, $exchange);
}
}
}

#now I display the solutions and user can choose one of them
print "These are the correction of word $word\n";
for (my $c = 0; $c < @correct; $c++) {
print "$c. : $correct[$c]\n";
}


print "Are you interested on one of this solution? Type the number or type 'exit'.\n";

$user = <STDIN>;
chomp ($user);

if ("$user" eq 'exit') {
print "Next word.\n";
}
else {
$word =~ s/$word/$correct[$user]/;
}
}
}
}

close(T);
close(T2);



Laurent_R
Veteran / Moderator

Nov 29, 2012, 9:58 AM

Post #4 of 10 (2308 views)
Re: [Laurent_R] suggestions for spellchecker [In reply to] Can't Post

From a very quick look at your code,


Code
$word =~ s/$word/$correct[$user]/;


will not work, because your line is stored in $line2.

So you would need something like this:


Code
$line2 =~ s/$word/$correct[$user]/;


Then you only need to print $line2 to your new file.


Jurafsky
Novice

Nov 30, 2012, 1:48 AM

Post #5 of 10 (2296 views)
Re: [Laurent_R] suggestions for spellchecker [In reply to] Can't Post

I already tried but there are two problems :

1. the script reads each line of the input file many times (is there an error in the foreach loop?)
2. the script doesn't copy the right lines in the output file.

for this reason I thought it would be better to make a substitution in $word and then copy it in another scalar.. but then I've always the same problem: how can I create the new line?

Unsure


Got it !
if you have a better solution, please tell me!


Code
				#now I display the solutions and user can choose one of them 
print "These are the correction of word $word\n";
for (my $c = 0; $c < @correct; $c++) {
print "$c. : $correct[$c]\n";
}


print "Are you interested on one of this solution? Type the number or type 'exit'.\n";

$user = <STDIN>;
chomp ($user);

if ("$user" eq 'exit') {
print "Next word.\n";
}
else {
$line2 =~ s/$word/$correct[$user]/;
$word = $error;
@words = grep { $_ ne $error } @words;
}
}
}
print T2 "$line2 \n";
}



(This post was edited by Jurafsky on Nov 30, 2012, 3:53 AM)


Laurent_R
Veteran / Moderator

Nov 30, 2012, 3:57 AM

Post #6 of 10 (2292 views)
Re: [Jurafsky] suggestions for spellchecker [In reply to] Can't Post

I don't think you're reading the same line several times, but if you have several words in your line, then you will be looping on the line, once for each word.

For point 2, you have to explain further, but the fact is that if several words are misspelled, you need to manage properly muliple substitutions.


Jurafsky
Novice

Dec 1, 2012, 4:35 AM

Post #7 of 10 (2282 views)
Re: [Laurent_R] suggestions for spellchecker [In reply to] Can't Post

I think the problem was the first foreach loop.. So with


Code
else {  
$line2 =~ s/$word/$correct[$user]/;
$word = $error;
@words = grep { $_ ne $error } @words;
}


I made a substitution in line, then I copied $word in another scalar, $error, and then I created again the @words array without $error !

What do you think? It seems ok ! Now I need a text with many errors to test it :P


Jurafsky
Novice

Dec 2, 2012, 12:53 AM

Post #8 of 10 (2263 views)
Re: [Jurafsky] suggestions for spellchecker [In reply to] Can't Post

I've just one problem:

I'm using the ms-dos terminal and some characters (for ex. , , etc) aren't well displayed.

For example the programm says there's an error in word "trs" even if it exists in the dictionary file.
I think it is due to the "encode-decode" in perl script.

I never succeeded in using Encode module.. Can someone explain what have I to do?


Laurent_R
Veteran / Moderator

Dec 2, 2012, 1:15 AM

Post #9 of 10 (2262 views)
Re: [Jurafsky] suggestions for spellchecker [In reply to] Can't Post

I can't really help you on this, as I usually avoid this type of problems (the platforms from which the data I am working on come from mostly don't support Unicode or UTF8, or only in a very limited way.).

Just a few pointers:

Perl Unicode tutorial: http://perldoc.perl.org/perlunitut.html
Perl Unicode introduction: http://perldoc.perl.org/perluniintro.html
Unicode support in Perl: http://perldoc.perl.org/perlunicode.html
Perl locale handling (internationalization and localization): http://perldoc.perl.org/perllocale.html


Jurafsky
Novice

Dec 3, 2012, 3:05 AM

Post #10 of 10 (2224 views)
Re: [Laurent_R] suggestions for spellchecker [In reply to] Can't Post

thank you!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives