Home: Perl Programming Help: Intermediate:

New User

Mar 27, 2016, 6:00 AM

Views: 6717

Hello everyone!
I have a table @tous_mots and which contains all the words of two documents and I want to find the TFIDF score for each ward and print it.
Do you know how I could do it?
The following code doesn't work.
If you have any suggestions or idea, it would be really helpful, I am new with modules.
my $i;
foreach $i(@tous_mots){
my $a=new Text::TFIDF(file=>["a.sans_outils.txt","b.sans_outils.txt"]);

my $b=$a->TFIDF("a.sans_outils.txt",$mots1[$i]);
print "$mots1[$i] $b\n"
thank you in advance


Mar 27, 2016, 4:22 PM

Views: 6706
Re: [Dora] Text::TFIDF

Your main problem is that you do not understand perl's foreach loop. Please read that section of perlsyn. To learn to use perl's documentation tool, type:

perldoc perldoc

Although it would probably work, it is a bad idea to use the variable names $a and $b. They are reserved for the sort routine.

Always place these two commands near the start of your script. They help to identify many errors.

use strict; 
use warnings;

Your use of the module appears to be correct.

Please make these changes. Post again if you still need help. (You probably should be posting in the beginner's forum.)
Good Luck,

New User

Apr 3, 2016, 11:24 AM

Views: 6645
Re: [BillKSmith] Text::TFIDF

Thank you very much Bill!!
I changed the code and it seems much better, I have one last question. I am using the module Text::TFIDF for a french text and for the function TFIDF I get a lot of "Use of Uninitialized value in multiplication <*> at TFIDF". Do you have any idea why this is happenig? My prof in the university didn't know :(

Veteran / Moderator

Apr 4, 2016, 7:28 AM

Views: 6634
Re: [Dora] Text::TFIDF

Please show your current code. It is very difficult to say why a program doesn't work without seeing it.

(This post was edited by Laurent_R on Apr 4, 2016, 7:29 AM)


Apr 4, 2016, 8:53 AM

Views: 6627
Re: [Dora] Text::TFIDF

Before you give up and post all your code and data, there are several things you should do. Try to limit the scope of your question.

Does the code appear to "work" despite the error?

Do you get the error for every word?

How are the words that cause the error special?

Are they all long words?
Do they all contain special characters?
Do the all appear in one document?
Both documents?
Neither document?

Does a short list of words always work?
Does the error occur with English documents?
Do the words in you list contain whitespace or other separation characters at their start or end.

Now create a very short program (one that we can execute) which demonstrates the error. The list should be no more than about five words (probably hard coded in the script). The program does not have to print correct answers, only demo the error. Post this program (and related documents) as attachments.
Good Luck,

New User

Apr 4, 2016, 10:51 AM

Views: 6622
Re: [BillKSmith] Text::TFIDF

yes, I see, you are right!! I will try to do so :

Chris Charley

Apr 7, 2016, 12:20 PM

Views: 6573
Re: [Dora] Text::TFIDF

Hi Dora,

You have some errors in the code you posted. Loop should be: foreach my $I (0 .. $#tous_mots) to get the index of the word's array. Also, though not an error, you shouldn't use $a and $b variables because they're special vars. for the sort routine (and some others).
You might want to use something like: my $tf = new Text::TFIDF(file => ["a.sans_outils.txt","b.sans_outils.txt"]);
. (Also, that should be properly declared before the loop begins).

my $b=$a->TFIDF("a.sans_outils.txt",$mots1[$i]); should avoid the $b variable and better written as my $wgt = $tf->TFIDF("a.sans_outils.txt",lc($tous_mots[$i]));

The print is using an array you didn't have earlier and you probably want print "$tous_mots[$i] $wgt\n";

Note that this module lowercases the words internally, so you should be lowercasing the word you give it. (see above where lc($tous_mots[$i]));

The code for Text::TFIDF can be examined and it shows the low case operation. You'll find this on line 93 (my $line = lc($_);).

You stated in your post I am using the module Text::TFIDF for a french text and for the function TFIDF I get a lot of "Use of Uninitialized value in multiplication <*> at TFIDF". Do you have any idea why this is happening?

The reason most likely is that any words you search for weight that have uppercase letters will not be found by the module because internally, it lowercases all the words in the document.

Hopefully, this will get you on the way to a solution.

The reason you are getting negative results is because of the the calculation for word frequency involves log base 10 and if the calculation yields a number lass than 1, the log will be negative. (See the IDF function in Text::TFIDF).

Here is a small program I wrote using the Text::TFIDF module.

use strict;
use warnings;
use 5.014;
use Text::TFIDF;

my $anna = do {local $/; <>}; # file == anna.book

my %words = map{ $_ => 1} map {lc} map{ s/[:;"',.?!]+//gr } split /\s+/, $anna;

# say scalar keys %words;

my $tf = Text::TFIDF->new(file => ['anna.book']);

for my $word (lc('Garçon'), keys %words) {
my $wgt = $tf->TFIDF('anna.book',$word);
$words{$word} = $wgt;

for my $word (sort {$words{$b} <=> $words{$a}} keys %words) {
printf "%-15s%.6f\n", $word, $words{$word};

(This post was edited by Chris Charley on Apr 10, 2016, 8:44 AM)