CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
While/each vs. For : Different results.

 



s660117
User

Jul 21, 2013, 10:07 AM

Post #1 of 22 (1430 views)
While/each vs. For : Different results. Can't Post

Hi,
Using Perl v5.6.3 unders 64-bit Windows 7, I have written a simple script that looks for duplicates in the ITunes library.
I read through the directory, basically strip out any duplicate indicators - such as (1), (2), etc - and increment the song count in a dbm whose key is the generic song name. Duplicate songs are then printed, along with their count.
My problem is that I get different output depending on whether I use a "while/each" construct or a "for" construct to read through the updated dbm. In the case of the latter, counts balance and the number of records read from the dbm plus the number of duplicates encountered matches the total number of input records. If, on the other hand, I use "while/each", the count consistently comes up short. What is more it appears that the records not read from the dbm using "while/each" are consistent across runs. I have isolated and examined several such records, but there seems to be nothing unique about them.
Thus, the counts produced by "while/each" are -
Total number of songs read = 2629
Total number of dbm records read = 2430
Total number of duplicate songs = 78
Total number of duplicates = 97,
whereas those produced by "for" are -
Total number of songs read = 2629
Total number of dbm records read = 2430
Total number of duplicate songs = 145
Total number of duplicates = 199.

My script, for what it's worth, is --

Code
#! /usr/bin/perl -w 
use strict;
use Win32;
# no warnings;
# use IO::Tee;

my $song_name;
my $song_count;
my %COUNT;

my $total_count = 0;
my $dupe_count = 0;
my $total_dbms = 0;
my $dupes = 0;

chdir('i:/') || die "Failed to change drive to I $!";
opendir(MUSIC,'My_Music') || die "Failed to open Music $!";
dbmopen(%COUNT, "Count_Songs", 0644) || die "Failed to open Count_Songs $!";

while ($song_name = readdir(MUSIC)) {
if (-d $song_name) {
print "Skipping directory $song_name\n";
next;
}
$song_name =~ tr/A-Z/a-z/;
next if ($song_name =~ /albumart/);
$song_name =~ s/[\\\/:*?"<>|]/ /g;
$song_name =~ s/\.mp3$//i;
$song_name =~ s/ *\(\d+\)//g;
$song_name =~ s/-/_/g;
$song_name =~ s// /g;
$song_count = $COUNT{$song_name};
$song_count = 0 if ($song_count eq '');
++$song_count;
$COUNT{$song_name} = $song_count;
++$total_count;
}

open DUPERPT, ">", "Dupe_Report.txt" || die "Failed to open Dupe_Report $!";

while (($song_name, $song_count) = each(%COUNT)) {
# foreach $song_name (keys %COUNT) {
# $song_count = $COUNT{$song_name};
++$total_dbms;
if ($song_count > 1) {
print "Song $song_name has a count of $song_count\n";
print DUPERPT "$song_name has a count of $song_count\n";
++$dupe_count;
$dupes += $song_count;
}
$COUNT{$song_name} = 0;
}

print "Total number of songs = $total_count\n";
print "Total number of dbm records read = $total_dbms\n";
print "Total number of duplicate songs = $dupe_count\n";
$dupes -= $dupe_count;
print "Total number of duplicates = $dupes\n";
print DUPERPT "Total number of songs read = $total_count\n";
print DUPERPT "Total number of dbm records read = $total_dbms\n";
print DUPERPT "Total number of duplicate songs = $dupe_count\n";
print DUPERPT "Total number of duplicates = $dupes\n";

closedir(MUSIC);
dbmclose(%COUNT);
close(DUPERPT);


Thanks in advance for any help in this matter,
s660117


BillKSmith
Veteran

Jul 21, 2013, 4:19 PM

Post #2 of 22 (1414 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

Declaring all your variables at the start of your program is a very bad idea. In fact, I suspect that it is the source of your problem. A careful reading of the syntax for foreach and while (perldoc perlsyn) shows that they are not using same copy of $song_name. I do not see how this is causing your problem, but a good debug strategy is to fix the known problems and see what happens.

Please change the lines that I have indicated with #### and try both the foreach and the while. Let me know what happens.


Code
#! /usr/bin/perl -w 
use strict;
use Win32;
#my $song_name; ####
#my $song_count; ####
my %COUNT;
my $total_count = 0;
my $dupe_count = 0;
my $total_dbms = 0;
my $dupes = 0;
chdir('i:/') || die "Failed to change drive to I $!";
opendir( MUSIC, 'My_Music' ) || die "Failed to open Music $!";
dbmopen( %COUNT, "Count_Songs", 0644 ) || die "Failed to open Count_Songs $!";

while ( my $song_name = readdir(MUSIC) ) { #####
if ( -d $song_name ) { print "Skipping directory $song_name\n"; next; }
$song_name =~ tr/A-Z/a-z/;
next if ( $song_name =~ /albumart/ );
$song_name =~ s/[\\\/:*?"<>|]/ /g;
$song_name =~ s/\.mp3$//i;
$song_name =~ s/ *\(\d+\)//g;
$song_name =~ s/-/_/g;
$song_name =~ s// /g;
my $song_count = $COUNT{$song_name}; ####
$song_count = 0 if ( $song_count eq '' );
++$song_count;
$COUNT{$song_name} = $song_count;
++$total_count;
}
open DUPERPT, ">", "Dupe_Report.txt" || die "Failed to open Dupe_Report $!";
while ( my ( $song_name, $song_count ) = each(%COUNT) ) { ####

#foreach my $song_name (keys %COUNT) { ####
# my $song_count = $COUNT{$song_name}; ####
++$total_dbms;
if ( $song_count > 1 ) {
print "Song $song_name has a count of $song_count\n";
print DUPERPT "$song_name has a count of $song_count\n";
++$dupe_count;
$dupes += $song_count;
}
$COUNT{$song_name} = 0;
}
print "Total number of songs = $total_count\n";
print "Total number of dbm records read = $total_dbms\n";
print "Total number of duplicate songs = $dupe_count\n";
$dupes -= $dupe_count;
print "Total number of duplicates = $dupes\n";
print DUPERPT "Total number of songs read = $total_count\n";
print DUPERPT "Total number of dbm records read = $total_dbms\n";
print DUPERPT "Total number of duplicate songs = $dupe_count\n";
print DUPERPT "Total number of duplicates = $dupes\n";
closedir(MUSIC);
dbmclose(%COUNT);
close(DUPERPT);

I have several other issues with your code, but lets get it working before we try to improve it.

UPDATE: I attached a copy of the code I want you to try.
Good Luck,
Bill

(This post was edited by BillKSmith on Jul 21, 2013, 4:37 PM)
Attachments: s660117.pl (2.00 KB)


s660117
User

Jul 21, 2013, 5:22 PM

Post #3 of 22 (1410 views)
Re: [BillKSmith] While/each vs. For : Different results. [In reply to] Can't Post

Hi, Bill... Thanks for your reply.
I made the changes you suggested and still come up with a discrepancy.

The "for" counts are the same as before --
Total number of songs read = 2629
Total number of dbm records read = 2431
Total number of duplicate songs = 144
Total number of duplicates = 198

The "while/each" counts are different --
Total number of songs read = 2629
Total number of dbm records read = 2431
Total number of duplicate songs = 77
Total number of duplicates = 100.

One thing that is interesting is that the total number of records read from the dbm are constant across both methods and both before and after the changes.

??
Thanks,
s660117


Laurent_R
Veteran / Moderator

Jul 21, 2013, 11:19 PM

Post #4 of 22 (1403 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

Difficult to know without your data to be able to test, but from a very quick look, it seems that in your while loop, you are counting the values and in the for loop you are counting the keys. It is possible that, if some values are undef, you may get a different count. Again, I cannot test, this is just looking possible.


s660117
User

Jul 22, 2013, 5:29 AM

Post #5 of 22 (1401 views)
Re: [Laurent_R] While/each vs. For : Different results. [In reply to] Can't Post

Thanks for the reply.
I'm thinking myself that it must have something to do with the data, but cannot determine what it is.
I have attempted to attach a small version of the file that includes a set of records not processed by while/each, but can't seem to figure out how to do it.
s660117


BillKSmith
Veteran

Jul 22, 2013, 7:11 AM

Post #6 of 22 (1395 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

All we need is the hash %COUNT from a run that does not work properly with 'while'. Print it to a file using Data::Dumper and attach that file to your post.
Good Luck,
Bill


s660117
User

Jul 22, 2013, 7:39 AM

Post #7 of 22 (1394 views)
Re: [BillKSmith] While/each vs. For : Different results. [In reply to] Can't Post

Bill,
I'm confused... The "Total number of dbm records read" is the total number of records in the dbm and is constant in any case.
Here's the problem I'm having with uploading an attachment --
I click on browse and locate the stripped down input file within the I: directory. I then open the file, but am unable to select all seven records within it using shift/right mouse button. With nothing else to do, I'm forced to cancel and when I return to the reply screen, I see "No file selected".
Does this make any sense?
s660117


BillKSmith
Veteran

Jul 22, 2013, 8:27 AM

Post #8 of 22 (1388 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

After you 'open' the file, control returns to the forum screen. In the attachment section, you will see the name of your file and a button "Upload Attachment". Click that button, and the entire file uploads. You will then find its name it the list of attached files.

All the count info is in the values part of %COUNT. The while loop in question summarizes that data (incorrectly). Clearly the hash is built correctly or your program would never run. I plan to debug the while loop by restoring the hash from your file rather than computing it.
Good Luck,
Bill


s660117
User

Jul 22, 2013, 8:59 AM

Post #9 of 22 (1386 views)
Re: [BillKSmith] While/each vs. For : Different results. [In reply to] Can't Post

Bill,
OK.... Heres what I did: I copied 7 directory entries for 4 different songs to a .txt file and modified the script to process that file.
I get identical results to those produced by the script that processes the entire Music directory.
Here is the new script, with changes marked by "####" --

Code
#! /usr/bin/perl  
use strict;
use Win32;
# no warnings;
# use IO::Tee;

my %COUNT;

my $total_count = 0;
my $dupe_count = 0;
my $total_dbms = 0;
my $dupes = 0;

chdir('i:/') || die "Failed to change drive to I $!";
open(SAMPLE, 'My_Music_Sample.txt') || die "Failed to open Music_Sample $!"; ####
dbmopen(%COUNT, "Count_Songs", 0644) || die "Failed to open Count_Songs $!";

while (my $song_name = (<SAMPLE>)) { ####
chomp($song_name);
if (-d $song_name) {
print "Skipping directory $song_name\n";
next;
}
$song_name =~ tr/A-Z/a-z/;
next if ($song_name =~ /albumart/);
$song_name =~ s/[\\\/:*?"<>|]/ /g;
$song_name =~ s/\.mp3$//i;
$song_name =~ s/ *\(\d+\)//g;
$song_name =~ s/-/_/g;
$song_name =~ s// /g;
my $song_count = $COUNT{$song_name};
$song_count = 0 if ($song_count eq '');
++$song_count;
$COUNT{$song_name} = $song_count;
++$total_count;
}

open DUPERPT, ">", "Dupe_Report.txt" || die "Failed to open Dupe_Report $!";

# while (my ($song_name, $song_count) = each(%COUNT)) {
foreach my $song_name (keys %COUNT) {
my $song_count = $COUNT{$song_name};
++$total_dbms;
if ($song_count > 1) {
print "Song $song_name has a count of $song_count\n";
print DUPERPT "$song_name has a count of $song_count\n";
++$dupe_count;
$dupes += $song_count;
}
$COUNT{$song_name} = 0;
}

print "Total number of songs = $total_count\n";
print "Total number of dbm records read = $total_dbms\n";
print "Total number of duplicate songs = $dupe_count\n";
$dupes -= $dupe_count;
print "Total number of duplicates = $dupes\n";
print DUPERPT "Total number of songs read = $total_count\n";
print DUPERPT "Total number of dbm records read = $total_dbms\n";
print DUPERPT "Total number of duplicate songs = $dupe_count\n";
print DUPERPT "Total number of duplicates = $dupes\n";

closedir(MUSIC);
dbmclose(%COUNT);
close(DUPERPT);

The input file has been uploaded to an attachment.
If you get the same results as me, you will find that the "for" construct detects two songs that occur more than once in the file -
my old man and christmas wrapping.
If you then run the script using the "while/each" construct, you should find that the the " if ($song_count > 1) {" does not detect that there are three occurances of christmas wrapping in the input.
s660117
Attachments: My_Music_Sample.txt (0.26 KB)


s660117
User

Jul 22, 2013, 10:23 AM

Post #10 of 22 (1379 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

Two other odd things about the dbm:
1) if I comment out the code that opens and closes the dbm and use a simple hash instead, the discrepancy goes away and "for" and "while/each" behave identically; and
2) if I create a separate script --

Code
# usr/bin/perl -w 
use strict;

my %COUNT;
dbmopen(%COUNT, "Count_Songs", 0644) || die "Failed to open Count_Songs $!";

if (%COUNT) {
my $total_missed = 0;
while ( my ($song_name, $song_count) = each(%COUNT)) {
if ($song_count > 0) {
print "Song $song_name has a count of $song_count\n";
++$total_missed;
}
}
print "Total number of dupes missed = $total_missed\n";
} else {
print "COUNT IS EMPTY\n";
}

dbmclose(%COUNT);

to read the dbm, looking for counts greater than 0, the script fails because the dbm is empty.
But this can't be so because, after a successful "for" run, the values of the dbm should all be 0.
s660117


FishMonger
Veteran / Moderator

Jul 22, 2013, 11:57 AM

Post #11 of 22 (1372 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post


Quote
Using Perl v5.6.3 unders 64-bit Windows 7

That's the first problem I see.

Why are you running such an old outdated version of perl? If you were on an older unix system, having an old perl version might be understandable, but having it on a newer 64-bit Windows doesn't make any sense and could very well be a contributing factor in the problem you're having.


s660117
User

Jul 22, 2013, 12:20 PM

Post #12 of 22 (1369 views)
Re: [FishMonger] While/each vs. For : Different results. [In reply to] Can't Post

Thanks for the reply, Fishmonger
v5.6.3 was a typo. I'm actually running v5.16.3.
s660117


BillKSmith
Veteran

Jul 22, 2013, 1:32 PM

Post #13 of 22 (1363 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

I have duplicated your results using perl v5.16.1 on windows xp. The each function (with while) works with a standard hash, and the keys (with foreach) function works with either.

I have tried to print $song_name and $song_count immediately after the while. ($song_count is usually wrong) I print $COUNT with Data::Dumper immediately before the while. (It became perfect after I added a statement to clear the database right after dbmopen, but this did not fix the each problem)

It appeared that each no longer works with dbopen. I wrote a simple program to test this. It worked fine.
Good Luck,
Bill


FishMonger
Veteran / Moderator

Jul 22, 2013, 1:39 PM

Post #14 of 22 (1362 views)
Re: [BillKSmith] While/each vs. For : Different results. [In reply to] Can't Post


Quote
It appeared that each no longer works with dbopen.


I have not run any tests, but your results may show part of the reason dbmopen is unofficially depreciated.


Code
perldoc -f dbmopen 
dbmopen HASH,DBNAME,MASK
[This function has been largely superseded by the tie function.]



(This post was edited by FishMonger on Jul 22, 2013, 1:40 PM)


s660117
User

Jul 22, 2013, 1:48 PM

Post #15 of 22 (1360 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

As I wrote, I tried unsuccessfully to open %COUNT in a separate script, in an attempt to see what the dbm looked like after the main script ran.
When that failed, I added the new code, which looks for those songs with a count greater than 0, to the main script, just after closing all files.
The new code reads --

Code
   my $total_missed = 0; 
while (my ($song_name, $song_count) = each(%COUNT)) {
# for my $song_name (keys %COUNT) {
# my $song_count = $COUNT{$song_name};
if ($song_count > 1) {
print "Song $song_name still has a count of $song_count\n";
print DUPERPT "Song $song_name still has a count of $song_count\n";
++$total_missed;
}
}
print "Total number of dupes missed = $total_missed\n";
print DUPERPT "Total number of dupes missed = $total_missed\n";

dbmclose(%COUNT);
close(DUPERPT);


When I run the script against the flat file with "for", I get --

jerry jeff walker _ my old man has a count of 2
waitresses _ christmas wrapping has a count of 3
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 2
Total number of duplicates = 3
Total number of dupes missed = 0.

And when I run the script against the flat file with "while/each", I get --

jerry jeff walker _ my old man has a count of 2
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 1
Total number of duplicates = 1
Song waitresses _ christmas wrapping still has a count of 3
Total number of dupes missed = 1

So the dbm update appears to be successful.

s660117


s660117
User

Jul 22, 2013, 2:21 PM

Post #16 of 22 (1352 views)
Re: [FishMonger] While/each vs. For : Different results. [In reply to] Can't Post

OK...
So, each no longer works with dbmopen?
I consulted perldoc on tie, but remain confused. If I code

Code
   use Count_Songs; 
tie(%COUNT, "Count_Songs", 'I:', 1, 0) || die "Failed to open Count_Songs $!";

I get a message the Count_Songs.pm was not found in @INC.
Exactly what is tie looking for in the .pm?
s660117


Laurent_R
Veteran / Moderator

Jul 22, 2013, 2:39 PM

Post #17 of 22 (1347 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post


Code
use Count_Songs;


This is wrong (unless you have a Count_Songs module or class somewhere, but from the context, I do not think this is the case).

You may want to use the Tie::Hash moduke.


s660117
User

Jul 22, 2013, 2:50 PM

Post #18 of 22 (1346 views)
Re: [Laurent_R] While/each vs. For : Different results. [In reply to] Can't Post

I don't think I understand tie.
If I remove "use Count_Songs" and add "use Tie::Hash", I get a message that reads

Code
Can't locate object method "TieHash" via package "Count_Songs".

My experience with object oriented code is limited.
s660117


s660117
User

Jul 23, 2013, 12:11 PM

Post #19 of 22 (1331 views)
Re: [FishMonger] While/each vs. For : Different results. [In reply to] Can't Post

Fishmoner,
I have rewritten my script to use tie as follows --

Code
   use Fcntl; 
use AnyDBM_File;

my $preferred_dbm = $AnyDBM_File::ISA[0];
my $dbmfile = "Count_Songs";
my %COUNT;

tie(%COUNT, $preferred_dbm, $dbmfile, O_CREAT|O_RDWR, 0664) || die "Failed to open DBM1 $!";

processing(...)

untie(%COUNT);


What I found was that while/each still doesn't work, returning

jerry jeff walker _ my old man has a count of 2
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 1
Total number of duplicates = 1

instead of --

jerry jeff walker _ my old man has a count of 2
waitresses _ christmas wrapping has a count of 3
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 2
Total number of duplicates = 3

In addition, I am still unable to open the existing dbm using a new script, which is coded as follows --


Code
# usr/bin/perl -w 
use strict;
use Fcntl;
use AnyDBM_File;

my $preferred_dbm = $AnyDBM_File::ISA[0];
my %COUNT;
tie(%COUNT, $preferred_dbm, "Count_Songs", O_RDONLY, 0644) || die "Failed to open dbm $!";

if (%COUNT) {
my $total_missed = 0;
for my $song_name (keys %COUNT) {
my $song_count = $COUNT{$song_name};
if ($song_count > 0) {
print "Song $song_name has a count of $song_count\n";
++$total_missed;
}
}
print "Total number of dupes missed = $total_missed\n";
} else {
print "COUNT IS EMPTY\n";
}

untie(%COUNT);


It always returns "COUNT IS EMPTY" even after a successful "for/keys" run of the main script, which should leave all items with a count of 0.
s660117


s660117
User

Jul 24, 2013, 9:20 AM

Post #20 of 22 (1313 views)
Re: [FishMonger] While/each vs. For : Different results. [In reply to] Can't Post

By the way....
Using dbmopen, "while/each" yields the same results as "for" under Linux.
Does this mean that the problem with the while construct is a bug in Perl Win32?
s660117


s660117
User

Aug 1, 2013, 12:39 PM

Post #21 of 22 (1233 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

The critical line of code here seems to be

Code
$Count{$song_name} = 0;

which updates the dbm values within the read loop.

Thus, to summarize --
> Under Windows, both while/each and for/keys work with a simple hash, even when the values are updated within the loop;
> Under Windows, for/keys works with a dbm, even when the values are updated within the loop;
> Under Linux, both white/each and for/keys work with a dbm, even when the values are updated within the loop;
> Under Windows, while/each works with a dbm, unless you try to update the values within the loop, in which case results are in error.

So the undocumented error is attempting to update dbm values while reading a dbm with while/each under Windows.


BillKSmith
Veteran

Aug 1, 2013, 8:11 PM

Post #22 of 22 (1222 views)
Re: [s660117] While/each vs. For : Different results. [In reply to] Can't Post

I think we can summerize that more. The "each" function does not work properly with the default database that you use under windows.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives