CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Two questions for an old project

 



wyndcrosser
Novice

Nov 16, 2012, 4:22 PM

Post #1 of 5 (3038 views)
Two questions for an old project Can't Post

Attention-To those who helped me out before thanks. I got the work project done.

Trying to figure out this book challenge from last semester. I never completed it. Any advice?

From what I see I have it completely done, but two parts still confuse me. I like to do these assignments, as I can help out other students having issues. Perl isn't my thing, but I assist where I can.

The point of the assignment is to take a txt document and then print out all the words, first from A-z and then based on occurrences of numbers greatest to least.

My problem is this only picks up words that have nothing following it or before it. so "search works", not "search, or search." Does REGEX and Split use similar commands?

foreach my $word (split /\s+/, $fname)

Second I'm trying to recall how I can do the count from greatest to least without an array.

for $word (sort keys %counter)
{
print "$counter{$word}\t$word\n";
}


Code
use strict; 

my $fname;
#my @file;
my $line;
my $word;

#sets fname as ARGV[0] in the command line.
$fname=$ARGV[0];

open IFILE,"<",$fname;
#@file = <IFILE>;
close IFILE;

#ARGV acquistion of the input file
open IFILE,"<",$fname;

#identify if the filename is actually there.
if (!(-e $fname))
{
print "File $fname does not exist\n";
exit;
}

# If too many parameters are specified (more than 1) an error message is displayed and the program terminates.
if ($#ARGV!=0)
{
print "Usage error. Syntax is filename.pl inputFile.txt\n";
exit;
}

##counter hash
my %counter;

##input
while (my $fname = <>)
{
##splits and tallis numbers
foreach my $word (split /\s+/, $fname)
{
$counter{$word}++;
}
}

##prints out words then number accordingly
for $word (sort keys %counter)
{
print "$word $counter{$word}\n";
}

##this should print out words from the greatest to the least based on count
## 4 Bat
## 3 Cat
## 3 Ca
for $word (sort keys %counter)
{
print "$counter{$word}\t$word\n";
}

Perl Newbie - 7 months of PERL basics.


wickedxter
User

Nov 16, 2012, 8:15 PM

Post #2 of 5 (3025 views)
Re: [wyndcrosser] Two questions for an old project [In reply to] Can't Post

you need to split with just \s the \s+ breaks the words down to each letter.


wyndcrosser
Novice

Nov 16, 2012, 8:44 PM

Post #3 of 5 (3021 views)
Re: [wickedxter] Two questions for an old project [In reply to] Can't Post

Gotcha, but that still gives me results like

search,
sometimes.

when I print them out.

Any other thoughts?
Perl Newbie - 7 months of PERL basics.


Laurent_R
Veteran / Moderator

Nov 17, 2012, 2:28 AM

Post #4 of 5 (3007 views)
Re: [wyndcrosser] Two questions for an old project [In reply to] Can't Post

Hi,

you don't give much information, but it is not too difficult to clean your data for pure words if this is what you want.

If you split your line on spaces:

Code
$_ = "To be, or not to be: that is the question."; 
my @words = split;

you get something like this in your @words array:


Code
0  'To' 
1 'be,'
2 'or'
3 'not'
4 'to'
5 'be:'
6 'that'
7 'is'
8 'the'
9 'question.'


Now, you can get rid of the trailing punctuation marks with something like this:


Code
s/[\W]//g foreach @words;


Now, the @words contains:

Code
0  'To' 
1 'be'
2 'or'
3 'not'
4 'to'
5 'be'
6 'that'
7 'is'
8 'the'
9 'question'


As you can see, the trailing ',', ':' and '.' are no longer there in elements 1, 5 and 9 of the array, you have "pure" words.

You may also try to split on word boundaries ("\b") rather than on spaces. Then you use grep to get rid of spaces and punctuation marks. Something like this:


Code
my $sentence = "To be, or not to be: that is the question."; 
my @words = split /\b/, $sentence;


This gives you the following @words array:


Code
0  'To' 
1 ' '
2 'be'
3 ', '
4 'or'
5 ' '
6 'not'
7 ' '
8 'to'
9 ' '
10 'be'
11 ': '
12 'that'
13 ' '
14 'is'
15 ' '
16 'the'
17 ' '
18 'question'
19 '.'


Now, you can get rid of array elements containing non alphabetical characters with the grep function:


Code
@words = grep {/\w+/} @words;


which gives you the following @words array:

Code
0  'To' 
1 'be'
2 'or'
3 'not'
4 'to'
5 'be'
6 'that'
7 'is'
8 'the'
9 'question'


Again, you have pure words which you can store in an hash for further processing.

There might be a couple of issues, though, with "words" containing apostrophies ("you're doing this") or hyphens ("post-increment"). So you might have to refine your regular expressions to tackle these specific cases.


wyndcrosser
Novice

Nov 19, 2012, 9:46 PM

Post #5 of 5 (2719 views)
Re: [Laurent_R] Two questions for an old project [In reply to] Can't Post

this was mad helpful thank you.
Perl Newbie - 7 months of PERL basics.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives