CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
count matches text

 



cmccabe1
Novice

Sep 20, 2013, 12:38 PM

Post #1 of 8 (557 views)
count matches text Can't Post

Hello,

Is there a command that can count the occurrences of words in a file? I have text file of (names) in it (Chris, Paul, James) and another text file (group) Chris, Paul, Chris, Paul, James. What I would like to do is use the names text to search and count the group text. So another file count would have Chris 3, Paul 2, James 1. Is this possible? Thanks.


Laurent_R
Veteran / Moderator

Sep 21, 2013, 4:02 AM

Post #2 of 8 (551 views)
Re: [cmccabe1] count matches text [In reply to] Can't Post

Yes, this is possible. Read the name file and store your names in memory as an array. Once you done that, read the other file, and for each line (or each data chunck), check each name and count their occurrence. This implies nested loops, which can take time if your files are really large, but that's acceptable if your files are small or moderately large. If the files are really large, I can think about a couple of other solutions, but they are somewhat more complicated to implement and should be considered only if the other solution is not workable.

Have you tried something already?


FishMonger
Veteran / Moderator

Sep 21, 2013, 6:14 AM

Post #3 of 8 (550 views)
Re: [Laurent_R] count matches text [In reply to] Can't Post

Read the name file and store your names in memory as an array a hash.


cmccabe1
Novice

Sep 21, 2013, 6:33 AM

Post #4 of 8 (547 views)
Re: [Laurent_R] count matches text [In reply to] Can't Post

With the code below I get this error:

Unrecognized character \x93; marked by <-- HERE after ") or die <-- HERE near column 41 at /home/cmccabe/homopolymer.pl line 10.

#!/usr/bin/perl


sub by_count {
$count{$b} <=> $count{$a};
}

open(INPUT, "<7matches.txt");
open(OUTPUT, ">genes.txt");
open(GENES, "<search_genes.txt") or die “Cannot find search_genes.txt”;
$bucket=join(“\|”,map {chomp;s/\cM|\cJ//g;$_} <GENES>);

while(<INPUT>){
@words = split(/\s+/);
foreach $word (@words){
if($word=~/($bucket)/io){
$count{$1}++;}
}
}
foreach $word (sort by_count keys %count) {
print OUTPUT "$word occurs $count{$word} times\n";
}

close INPUT;
close OUTPUT;

Thank you.


FishMonger
Veteran / Moderator

Sep 21, 2013, 7:36 AM

Post #5 of 8 (545 views)
Re: [cmccabe1] count matches text [In reply to] Can't Post

It appears that you're using a word processor instead of a text editor. Don't do that!!!

The word processor is using "smart quotes" on lines 10 and 11. Replacing those with regular quotes will correct that error.


cmccabe1
Novice

Sep 21, 2013, 7:47 AM

Post #6 of 8 (541 views)
Re: [FishMonger] count matches text [In reply to] Can't Post

Thank you that was it. I wasn't aware that a word processor had a different encoding than a text editor. Thanks again.


Laurent_R
Veteran / Moderator

Sep 21, 2013, 11:01 AM

Post #7 of 8 (537 views)
Re: [FishMonger] count matches text [In reply to] Can't Post


In Reply To
Read the name file and store your names in memory as an array a hash.


Well, I first wrote a hash, and then, in the specific context, thought an array was slightly better, since the idea is to go through each name anyway, we are not using the structure as a lookup table. Having said that, a hash has the advantage that it can also be used as a counter.


FishMonger
Veteran / Moderator

Sep 21, 2013, 11:15 AM

Post #8 of 8 (535 views)
Re: [Laurent_R] count matches text [In reply to] Can't Post

My reading of the requirements told me that a hash lookup would be the appropriate and most efficient. I gues that goes to show that 2 people could read the same thing and get a completely different interpretation. :)

After seeing the OP's code, even with its many issues, IMO strengthens my interpretation.


(This post was edited by FishMonger on Sep 21, 2013, 11:15 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives