CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Counting keywords in text file.

 



moviesigh
New User

Nov 30, 2013, 6:57 PM

Post #1 of 7 (1452 views)
Counting keywords in text file. Can't Post

Hello,
I would like to count the frequency of certain keywords in the text file, which is sample.txt.
For example, I determine a main word as "Steve Jobs" and "Executive," and I would like to count the frequency of "stock option" and "package" within 10 words from "Steve Jobs" and "Executive" for the sample text below. The result that I expected is 4.

Sample text)
Stock option is the most popular compensation policy in the world these days. Steve Jobs also received huge amount of stock options, and the stock option was exercised before the fiscal year.
Different from his compensation package, the other executives received less amount of stock options.

To get the result, I used the code below and used the command that "perl code.pl sample.txt "Steve Jobs" "Executive" 10 "stock option" "package"

However, the error message occurs. The error message is "Use of uninitialized value $distance in numeric le <<=> at line..."

Could you please give me some advice to get the result I want? I am attaching the sample text and the code that I used. In the sample text, there are three different articles and it is divided by "Document ". So, I expect to get the results for the three articles. I am looking forward to your responses. I hope you all have a great weekend! I really appreciate it in advance.

PERL code)

use strict;
use warnings;

my ($filename, @mainword, $distance, @search) = @ARGV;

my $content;
open my $fh, '<', $filename or die $!;
local $/ = undef;
$content = <$fh>;
close $fh;

my @docs = split 'Document ', $content;
foreach my $doc ( @docs ) {

my $count = 0;

my $mainword = '(' . (join '|', map { "\Q$_\E" } @mainword) . ')';
my $search = '(' . (join '|', map { "\Q$_\E" } @search) . ')';


for (my $dist = 0; $dist <= $distance; $dist++) {
while ( $doc =~ /
(?:^|\W)
$search
(?=
(?:\W++\w++){$dist}
\W++\Q$mainword\E
)
/ixsg
)
{
print " found [$1] at ", $-[1], "\n";

$count++;
}

while ( $doc =~ /
(?:^|\W)
\Q$mainword\E
(?=
(?:\W++\w++){$dist}
\W++$search
)
/ixsg
)
{
print "-found [$1] at ", $-[1], "\n";
$count++;
}
}

print "match: $count\n";
}
Attachments: code.pl (1.20 KB)
  sample.txt (0.75 KB)


Zhris
Enthusiast

Dec 1, 2013, 1:23 AM

Post #2 of 7 (1444 views)
Re: [moviesigh] Counting keywords in text file. [In reply to] Can't Post

Hi,

This post is to address the first issue of "Use of uninitialized value $distance in numeric le <<=> at line...".

Lets take a look at the first part of your code, reading the arguments supplied:




$perl code.pl sample.txt "Steve Jobs" "Executive" 10 "stock option" "package"

code.pl:

Code
use Data::Dumper; 

my ($filename, @mainword, $distance, @search) = @ARGV;

print Dumper($filename, \@mainword, $distance, \@search);


output:

Code
$VAR1 = 'sample.txt'; 
$VAR2 = [
'Steve Jobs',
'Executive',
'10',
'stock option',
'package'
];
$VAR3 = undef;
$VAR4 = [];




We can see from the dump that $filename contains the filename, but it appears that @mainword has slurped the rest of the arguments, the behaviour I would expect. Therefore we can see now that $distance is undefined from the beginning ( the reason for the "Use of uninitialized value $distance in numeric le <<=> at line..." error received later on ).

The reason for this behaviour is because Perl must be greedy when assigning an array to an array, unless you can provide more information. In your case @mainword is variable length, and Perl doesn't know how many elements it should assign from @ARGV before moving onto $distance.

Here is an adjustment, mainword and search are now comma separated strings and not individual arguments. Once we have read the arguments in, we can split the mainword and search strings up into individual elements:




$perl code.pl sample.txt "Steve Jobs, Executive" 10 "stock option, package"

code.pl:

Code
use Data::Dumper; 

my ($filename, $mainword_str, $distance, $search_str) = @ARGV;

my @mainword = split /\s*,\s*/, $mainword_str;
my @search = split /\s*,\s*/, $search_str;

print Dumper($filename, \@mainword, $distance, \@search);


output:

Code
$VAR1 = 'sample.txt'; 
$VAR2 = [
'Steve Jobs',
'Executive'
];
$VAR3 = '10';
$VAR4 = [
'stock option',
'package'
];




Here is a different adjustment. Getopt::Long ( http://search.cpan.org/~jv/Getopt-Long-2.42/lib/Getopt/Long.pm#Options_with_multiple_values ) from CPAN is good at handling complicated command line arguments, you may want to consider trying it out:




$perl code.pl --filename=sample.txt --mainword="Steve Jobs" --mainword="Executive" --distance=10 --search="stock option" --search="package"

code.pl:

Code
use Getopt::Long; 
use Data::Dumper;

my ($filename, @mainword, $distance, @search);

GetOptions( "filename=s" => \$filename,
"mainword=s" => \@mainword,
"distance=i" => \$distance,
"search=s" => \@search );

print Dumper($filename, \@mainword, $distance, \@search);


output:

Code
$VAR1 = 'sample.txt'; 
$VAR2 = [
'Steve Jobs',
'Executive'
];
$VAR3 = 10;
$VAR4 = [
'stock option',
'package'
];




Hope this is useful. This post should also express the importance of testing small sections of code before proceeding, by dumping the data and checking it looks as desired.

Chris


(This post was edited by Zhris on Dec 1, 2013, 1:40 AM)


Laurent_R
Veteran / Moderator

Dec 1, 2013, 2:48 AM

Post #3 of 7 (1432 views)
Re: [moviesigh] Counting keywords in text file. [In reply to] Can't Post

This post is crossposted on the Perlmonks forum.


moviesigh
New User

Dec 1, 2013, 6:36 PM

Post #4 of 7 (1380 views)
Re: [Zhris] Counting keywords in text file. [In reply to] Can't Post

Thank you very much, Chris. It was really helpful for me.

I followed the first way and the error message is gone now. But, I still cannot get the right results. The results are all "0" Could you please see my revised code? I am attaching my revised code and sample text again. I really appreciate it in advance. I apologize you if it is a really simple question. Because I am a real beginner, I even do not know whether my problems are simple or not. Thank you very much.

Sean

Command) perl code.pl sample.txt "Steve Jobs, Executive" 10 "stock option, package"
Code)
use strict;
use warnings;
use Data::Dumper;

my ($filename, $mainword_str, $distance, $search_str) = @ARGV;

my @mainword = split /\s*,\s*/, $mainword_str;
my @search = split /\s*,\s*/, $search_str;

my $content;
open my $fh, '<', $filename or die $!;
local $/ = undef;
$content = <$fh>;
close $fh;

my @docs = split 'Document ', $content;
foreach my $doc ( @docs ) {

my $count = 0;

my $mainword = '(' . (join '|', map { "\Q$_\E" } @mainword) . ')';
my $search = '(' . (join '|', map { "\Q$_\E" } @search) . ')';


for (my $dist = 0; $dist <= $distance; $dist++) {
while ( $doc =~ /
(?:^|\W)
$search*
(?=
(?:\W++\w++){$dist}
\W++\Q$mainword\E
)
/ixsg
)
{
print " found [$1] at ", $-[1], "\n";

$count++;
}

while ( $doc =~ /
(?:^|\W)
\Q$mainword\E
(?=
(?:\W++\w++){$dist}
\W++$search
)
/ixsg
)
{
print "-found [$1] at ", $-[1], "\n";
$count++;
}
}

print "match: $count\n";
}
Attachments: code.pl (1.32 KB)
  sample.txt (0.75 KB)


moviesigh
New User

Dec 1, 2013, 6:40 PM

Post #5 of 7 (1379 views)
Re: [Laurent_R] Counting keywords in text file. [In reply to] Can't Post

Hello, Laurent_R.

I did not know whether crossposting is matter. Is that a problem? If it is the problem, please let me know. I will take care of it. Thank you.


BillKSmith
Veteran

Dec 2, 2013, 5:21 AM

Post #6 of 7 (1328 views)
Re: [moviesigh] Counting keywords in text file. [In reply to] Can't Post

The only problem with crossposts is that responders do not want to spend time solving problems that are already solved on another forum. The solution is for you to add a comment to your post listing all crossposts.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Dec 2, 2013, 8:47 AM

Post #7 of 7 (1316 views)
Re: [moviesigh] Counting keywords in text file. [In reply to] Can't Post


In Reply To
Hello, Laurent_R.

I did not know whether crossposting is matter. Is that a problem? If it is the problem, please let me know. I will take care of it. Thank you.


No problem in doing it, but it is good to tell that you are crossposting (and possibly even provide a link) to avoid duplicate work between different forums.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives