CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Intermediate: Re: [Dora] Text::TFIDF: Edit Log

Chris Charley

Apr 7, 2016, 12:20 PM

Views: 6846
Re: [Dora] Text::TFIDF

Hi Dora,

You have some errors in the code you posted. Loop should be: foreach my $I (0 .. $#tous_mots) to get the index of the word's array. Also, though not an error, you shouldn't use $a and $b variables because they're special vars. for the sort routine (and some others).
You might want to use something like: my $tf = new Text::TFIDF(file => ["a.sans_outils.txt","b.sans_outils.txt"]);
. (Also, that should be properly declared before the loop begins).

my $b=$a->TFIDF("a.sans_outils.txt",$mots1[$i]); should avoid the $b variable and better written as my $wgt = $tf->TFIDF("a.sans_outils.txt",lc($tous_mots[$i]));

The print is using an array you didn't have earlier and you probably want print "$tous_mots[$i] $wgt\n";

Note that this module lowercases the words internally, so you should be lowercasing the word you give it. (see above where lc($tous_mots[$i]));

The code for Text::TFIDF can be examined and it shows the low case operation. You'll find this on line 93 (my $line = lc($_);).

You stated in your post I am using the module Text::TFIDF for a french text and for the function TFIDF I get a lot of "Use of Uninitialized value in multiplication <*> at TFIDF". Do you have any idea why this is happening?

The reason most likely is that any words you search for weight that have uppercase letters will not be found by the module because internally, it lowercases all the words in the document.

Hopefully, this will get you on the way to a solution.

The reason you are getting negative results is because of the the calculation for word frequency involves log base 10 and if the calculation yields a number lass than 1, the log will be negative. (See the IDF function in Text::TFIDF).

Here is a small program I wrote using the Text::TFIDF module.

use strict;
use warnings;
use 5.014;
use Text::TFIDF;

my $anna = do {local $/; <>}; # file ==

my %words = map{ $_ => 1} map {lc} map{ s/[:;"',.?!]+//gr } split /\s+/, $anna;

# say scalar keys %words;

my $tf = Text::TFIDF->new(file => ['']);

for my $word (lc('Garçon'), keys %words) {
my $wgt = $tf->TFIDF('',$word);
$words{$word} = $wgt;

for my $word (sort {$words{$b} <=> $words{$a}} keys %words) {
printf "%-15s%.6f\n", $word, $words{$word};

(This post was edited by Chris Charley on Apr 10, 2016, 8:44 AM)

Edit Log:
Post edited by Chris Charley (User) on Apr 10, 2016, 8:40 AM
Post edited by Chris Charley (User) on Apr 10, 2016, 8:44 AM

Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives