CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Word count help

 



dpuk44
Novice

Aug 3, 2011, 9:45 AM

Post #1 of 7 (1756 views)
Word count help Can't Post

Im trying to count characters, words, lines, sentences and paragraphs from a text file I have called beatrice.txt. I have worked out how to calculate the sentence count and the characters count, now im trying to work out the WORD count. Heres my code thus far:



#!C:\Perl\bin\perl.exe ~w

#Checks for command line argument, if not found asks for input from user
if ($#ARGV == -1) {
print "Please Enter a File Name: ";
$file_name = <STDIN>;
chomp ($file_name);
}
else
{
$file_name = $ARGV[0];
}

#Validates the name of the file
if ($file_name !~ m/^[A-Za-z_][A-Za-z0-9_]{0,7}$/) {
print "Not valid";
}

#If file name does not have .TXT extension, add one
if ($file_name !~ m/[TXT]txt$/)
{
$file_name .= ".TXT";
}

#Checks to see if file name exists
if (-e $filename)
{
die("File does not exist\n");
}

#Checks to see if file name is empty
if (-z $filename)
{
die("File is empty\n");
}

#Opens file
open(READFILE, "<$file_name") or die "Can't open file '$filename: $!";

$characters = 0;
$words = 0;
$lines = 0;
$sentences = 0;
$paragraphs = 0;

my($ch);

#Assign individual characters from Beatrice.txt to $ch for manipulation
while ($ch = getc(READFILE))
{
#Counts Characters in Beatrice.txt
if ($ch =~ m/^\w/)
{
$characters++;
}


###################
WORD COUNT TO GO HERE


#Counts Sentences in Beatrice.txt
if ($ch eq "?" || $ch eq "!" || $ch eq ".")
{
$sentences++;
}

print "CHARACTERS: $characters\n", "WORDS: $words\n", "SENTENCES: $sentences\n";


Any help would be great


kwatts59
Novice

Aug 3, 2011, 10:34 AM

Post #2 of 7 (1754 views)
Re: [dpuk44] Word count help [In reply to] Can't Post

I would use the split function to delimit by the space character and store the data into an array.
Then I would check the array size to find out how many words are in the file.

Another way is to use the LINUX "wc" command.


BillKSmith
Veteran

Aug 3, 2011, 2:08 PM

Post #3 of 7 (1747 views)
Re: [dpuk44] Word count help [In reply to] Can't Post

Just read the file into a string and process with regular expressions. Of course you still need your code to get and validate the file name.


Code
use strict; 
use warnings;
use Slurp;
my $file = slurp('file.txt');

my $word_count;
$word_count++ while $file =~ /\s+/g;

my $sentence_count;
$sentence_count++ while $file =~ /[?!]/g;

my $char_count = length( $file ) - ($word_count+$sentence_count);

print $char_count, ' ', $word_count, ' ', $sentence_count, "\n";

Good Luck,
Bill


dpuk44
Novice

Aug 4, 2011, 6:04 AM

Post #4 of 7 (1732 views)
Re: [dpuk44] Word count help [In reply to] Can't Post

is this right?

while ($ch = getc(READFILE))
{
#Counts Characters in Beatrice.txt
if ($ch =~ m/^\w/)
{
$characters++;
}

#Counts Sentences in Beatrice.txt
if ($ch eq "?" || $ch eq "!" || $ch eq ".")
{
$sentences++;
}

if (@words = split (/\s+/, (READFILE)))
{
$words++;
}
}


dpuk44
Novice

Aug 4, 2011, 6:51 AM

Post #5 of 7 (1727 views)
Re: [dpuk44] Word count help [In reply to] Can't Post

i did it like this



#!C:\Perl\bin\perl.exe ~w

#Checks for command line argument, if not found asks for input from user
if ($#ARGV == -1) {
print "Please Enter a File Name: ";
$file_name = <STDIN>;
chomp ($file_name);
}
else
{
$file_name = $ARGV[0];
}


#Validates the name of the file
if ($file_name !~ m/^[a-zA-Z_]{1,7}\.TXT$/i) {
print "Not valid";
}

#If file name does not have .TXT extension, add one
if ($file_name !~ m/\.txt$/i)
{
$file_name .= ".TXT";
}


#Checks to see if file name exists
if (!-e $file_name)
{
die("File does not exist\n");
}

#Checks to see if file name is empty
if (!-s $file_name)
{
die("File is empty\n");
}


#Opens file
open(READFILE, "<$file_name") or die "Can't open file '$filename: $!";

$characters = 0;
$words = 0;
$lines = 0;
$sentences = 0;
$paragraphs = 0;

my($ch);

#Assign individual characters from Beatrice.txt to $ch for manipulation
while ($ch = getc(READFILE))
{
$characters++;
$lastchar = $ch;

#WORDS
if (($ch eq "\t" || $ch eq " " || $ch eq "\n") && ($lastchar ne "\t" || $lastchar ne " " || $lastchar ne "\n"))
{
$words++;
}

#LINES
if ($ch eq "\n")
{
$lines++;
}

#SENTNCES
if ($ch eq "?" || $ch eq "!" || $ch eq ".")
{
$sentences++;
}

#PARAGRAPHS
if ($lastchar eq "\n")
{
$paragraphs++;
}


}

print "CHARACTERS: $characters\n", "WORDS: $words\n", "SENTENCES: $sentences\n", "PARAGRAPHS: $paragraphs\n";


close(READFILE);


BillKSmith
Veteran

Aug 4, 2011, 7:15 AM

Post #6 of 7 (1726 views)
Re: [dpuk44] Word count help [In reply to] Can't Post

You must 'rewind' (refer: perldoc -f seek) the file before you read it a second time. You want the second read to 'slurp' (refer: '$/' in perldoc perlvar or use a slurp module from CPAN) the entire file into a string. As written, it counts only the words in the first line. Other than that, I believe it is what you want. It is probably very slow.


Post #5 is much better!
Good Luck,
Bill

(This post was edited by BillKSmith on Aug 4, 2011, 8:34 AM)


dpuk44
Novice

Aug 8, 2011, 9:46 AM

Post #7 of 7 (1701 views)
Re: [BillKSmith] Word count help [In reply to] Can't Post

I appreciate your input Bill, but as a newbe I dont know what your talking about, I know you do with lost of experience, but can you explain in simple terms?

ThanksSmile

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives