CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
dynamic counting

 



scooper
Deleted

Sep 28, 2000, 11:04 AM

Post #1 of 7 (1276 views)
dynamic counting Can't Post

Hi there!

Can anyone think of a better way to do the following?

I'd like for the following code to count file types which ARE not accounted for in the code (for instance, if I were to put a .png file into $dir, the following code wouldn't see it -- and it wouldn't be counted.)

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


sub count_stuff {
$dir = $_;
next if $dir =~ /^\.\.?$/;

if ($dir =~ /.gif/i) {$gif_num +=1}
if ($dir =~ /.jpg/i) {$jpg_num +=1}
if ($dir =~ /.mov/i) {$mov_num +=1}
if ($dir =~ /.swf/i) {$swf_num +=1}
if ($dir =~ /.mp3/i) {$mp3_num +=1}

next if $dir =~ /.txt/i;
next if $dir =~ /.htm/i;
next if $dir =~ /.html/i;
next if $dir =~ /.shtml/i;
next if $dir =~ /.pl/i;
next if $dir =~ /.cgi/i;

print "$gif_num,$jpg_num,$mov_num,$swf_num,$mp3_num\n";
}
</pre><HR></BLOCKQUOTE>


I'm guessing that it involves some form of FOREACH filetype in $dir, create a new variable DYNAMICALLY, and increment the variable every time we see another file of the same type.

The section which ignores files like .txt, . & .. are OK to hard code.

Ultimately, I'd like to also use this method to count unknown variables in an array... like individual occurances of words.

[This message has been edited by scooper (edited 09-28-2000).]


scooper
Deleted

Sep 28, 2000, 10:10 PM

Post #2 of 7 (1276 views)
Re: dynamic counting [In reply to] Can't Post

I kinda get it, but for whatever reason it returns something odd:


<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>



#! /usr/bin/perl


&count_stuff;


my %found_counts;


# not so sure why this is outside sub, or what it 'my' really means
# guess: you're initializing a hash, but shouldn't it be inside the sub??


sub count_stuff
{


# the dir is whatever is coming IN

my ($dir) = @_;



# return 'undef' if the item contains '.', '..', '.txt', '.htm', '.shtml', '.pl', or '.cgi'

return () if ( ($dir =~ /^\.\.?$/) | | ($dir =~ /.txt/i) | |
($dir =~ /.htm/i) | | ($dir =~ /.shtml/i) | |
($dir =~ /.pl/i) | | ($dir =~ /.cgi/i)
);



# $dir is whatever is left over extension-wise ; this is a regex to have the sub just look at
# the extension after the dot(/\.) and whtever characters follow -- matching at the end of the string

$dir =~ /\.(.*?)$/;



# OK, sub -- you found something .... put it in a HASH INCREMENTALLY
# as many times as you find the same extension in the HASH.

$found_counts{$1}++;



# huh? print a ',' followed by whatever came into the construct = the current found countvalue in the hash

print join (',', map { "$_=$found_counts{$_}" } sort keys (%found_counts)), "\n";


}


# this returns =1???? umm... there _are_ 39 .jpgs in the test directory -- I'm sure I'm missing something


</pre><HR></BLOCKQUOTE>

[This message has been edited by scooper (edited 09-29-2000).]


dws
Deleted

Sep 28, 2000, 10:44 PM

Post #3 of 7 (1276 views)
Re: dynamic counting [In reply to] Can't Post

It's not sufficient to test for '.' and '..'. Instead, do
<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

# return undef if the item is a directory
return undef if -d $dir;</pre><HR></BLOCKQUOTE>
Also, you probably want to quote the '.' in the type checks.
<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>

return undef if $dir =~ /\.(?:txt|htm|shtml|pl|cgi)/;</pre><HR></BLOCKQUOTE>


[This message has been edited by dws (edited 09-29-2000).]


rGeoffrey
User / Moderator

Sep 29, 2000, 8:51 AM

Post #4 of 7 (1276 views)
Re: dynamic counting [In reply to] Can't Post

This version will hold all the counts in a hash. If $dir matches any of the forbidden endings we return early, otherwise the hash counts the extension. Also we don't need to match for 'html' because the 'htm' will have already matched.

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


my %found_counts;

sub count_stuff
{
my ($dir) = @_;

return () if ( ($dir =~ /^\.\.?$/) &#0124; &#0124; ($dir =~ /.txt/i) &#0124; &#0124;
($dir =~ /.htm/i) &#0124; &#0124; ($dir =~ /.shtml/i) &#0124; &#0124;
($dir =~ /.pl/i) &#0124; &#0124; ($dir =~ /.cgi/i)
);

$dir =~ /\.(.*?)$/;
$found_counts{$1}++;

print join (',', map { "$_=$found_counts{$_}" } sort keys (%found_counts)), "\n";
}
</pre><HR></BLOCKQUOTE>


scooper
Deleted

Sep 29, 2000, 12:29 PM

Post #5 of 7 (1276 views)
Re: dynamic counting [In reply to] Can't Post

dave --

Yep, thou art most correct! Thanks for the catch. But still I'm confused about why rGeoffrey's code returns =1. There's nothing strange (i.e. sub-directories) in the target.

Also, (TOTALLY OT) I'm of the school that I can reuse code for other purposes -- In this case I want to be able to return the frequency of words in a sentence (i.e. the previous sentence contains x incedences of the word "foo").

I think I understand a little of why rGeoffrey does what he did, and I actually have this working the OLD way in another program -- but I like to see how others with more experience would approach the same problem.
(BTW --I've only been programming _perl_ for about a month) Is there another way that you can think of??

TMTOWTDI --
scooper


rGeoffrey
User / Moderator

Sep 29, 2000, 1:18 PM

Post #6 of 7 (1276 views)
Re: dynamic counting [In reply to] Can't Post

Part of the problem might be that I designed my &count_stuff to be called inside a loop where each time you pass it the name of a file. But that means that you must already have a list of the files in the directory.

In the commented version after my code, you call &count_stuff without any arguments, so $dir is not set inside the subroutine. Thus &count_stuff is called exactly once and on that pass $dir is the empty string ''. However, I am not sure why it bothers to return a 1.

So here is a full program that I have tested and it does provide the correct counts. And I have used dws's better return statements.

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


#!/usr/local/bin/perl

my $directory = '.';
my @files = &Get_Files ($directory);

my %found_counts;

foreach (@files) {
&count_stuff ($_);
}

sub count_stuff
{
my ($file) = @_;
#return undef if the item is a directory or it has a forbidden extension
return undef if (-d $file);
return undef if ($file =~ /\.(?:txt|htm|shtml|pl|cgi)/);

#increment the correct count if we have one
$file =~ /\.(.*?)$/;
$found_counts{$1}++;

#print the message each time through the loop
print $file, " == ", join (',', map { "$_=$found_counts{$_}" } sort keys (%found_counts)), "\n";
}

#Get_Files will return an array of each file in $directory
sub Get_Files
{
my ($directory) = @_;

opendir SOURCE, $directory or die "serious dainbramage: $!";
my @allfiles = readdir SOURCE;
closedir SOURCE;

return (@allfiles);
}
</pre><HR></BLOCKQUOTE>

%found_count needs to be a global, or atleast declared outside of &count_stuff. The way I have it built, &count_stuff is called once for each file, so the hash that holds onto those counts must be declared outside otherwise you would lose the information you already collected.

There are other ways around this problem, like passing the hash, or a pointer to it into the subroutine, or having the subroutine do the the loop inside and pass it the whole list at once.

The map is there to do a loop through each key in the hash, and then print out something like "gif=3,pjg=39\n". It takes each key and prints the key=$found_count{'key'} and joins them together with commas. For more on map (my favorite feature of perl) see Simon's article at http://tlc.perlarchive.com/0010/01.shtml

Or for a more streamlined version you could use this

<BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR>


#!/usr/local/bin/perl

my $directory = '.';
opendir SOURCE, $directory or die "serious dainbramage: $!";
my @files = readdir SOURCE;
closedir SOURCE;

my %found_counts;

foreach (@files) {
unless ( (-d $_) &#0124; &#0124; ($_ =~ /\.(?:txt|htm|shtml|pl|cgi)/)) {
$_ =~ /\.(.*?)$/;
$found_counts{$1}++;
print $_, " == ", join (',', map { "$_=$found_counts{$_}" } sort keys (%found_counts)), "\n";
}
}
</pre><HR></BLOCKQUOTE>

Inside the foreach loop, the $_ means the filename from @files currently in play. And inside the map $_ refers to the key from %found_counts in play.


scooper
Deleted

Sep 30, 2000, 7:07 AM

Post #7 of 7 (1276 views)
Re: dynamic counting [In reply to] Can't Post

WOW --

That's really cool! I'm picking the code apart with the 'nutshell' book. It's pretty much as I expected that it would be, I just couldn't do it because:

A. I didn't know how to create a variable DYNAMICALLY.
B. I'd never seen the 'map' construct thing.


I'm going to go check out the article, and read my man pages. Then I'll work on retrofitting the code once I totally 'grok' the concepts.

It never ceases to amaze me what can be learned from a simple problem.

Thanks for your help !!!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives