Home: Perl Programming Help: Intermediate:
HASH-O-RAMA Data Processing Problem



stuckinarut
User

Feb 22, 2015, 7:35 PM


Views: 16837
HASH-O-RAMA Data Processing Problem

For 9 years I have sponsored a small Ham Radio on-air Contest event and manually done all tedious log checking of several thousand contacts from submitted logs. At 71 now my eyesight is not the best, and I've struggled to try and figure out the rest of how to do the bulk of the log checking in Perl.

Since all submitted log (QSO Lines) are consolidated into one Master (listQ.txt) file in random order by the Submitter, I'm Stuck-In-A-Rut trying to figure out whatever Arrays & Loops & Code can hopefully make this all work.

Below is my current Work-In-Progress that has led me to a Brick Wall of how to finish the actual log checking after all data is entered into a Hash. I've included some specifics as Comments within the Script as FYI. After the script will be the very short Test Data file (listQ.txt) I used.


Code
#!/usr/bin/perl 

use strict;
use warnings;

# -------------------------------------------
=begin comment

Sample listQ.txt file entry & data structure:

QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZFO BILL CA

(Etc., etc. - ALL QSO (Contact) lines for ALL participants appended in one large listQ.txt
file for processing.

In the above example:

$band = 3542 (actually a frequency which is converted to "80M" in the script);
$logcall = W7WHY
$logname = Tom
$logmult = OR
$callsign = N6ZFO
$callname = BILL
$callmult = CA

The "CW", (DATE) and (TIME) Columns/Fields are NOT needed for log checking.

Some of the submitted entries contain Upper/Lowercase Text which must be converted
to all UPPERCASE (unless processing is not case-sensitive).

1. Each "QSO:" line contains the Contact log information to be checked
2. Any Duplicate contacts on the *same* frequency band (40m or 80m) are NOT allowed
3. Possible Errors are:
A. Other stations logged were NOT actually worked/contacted on the frequency band(s)
indicated, which could also be due to incorrectly copied and/or logged Callsigns.
B. If the Callsign logged was correct, then an incorrect match or spelling of the NAME or
"MUL" (Abbreviations for USA States, Canadian Provinces/Territories or missing, etc.).
4. Station Callsigns worked/contacted who did not submit a log need to be included in the
errors.csv file and "manually" dealt with in the final (Non-Perl) processing stages.
5. The desired Summary objectives are described at the end of the existing code so far.

=end comment

=cut
# -------------------------------------------

my $Q_list;
my $line;
my $qso;
my $logtime;
my $band;
my $logcall;
my $logname;
my $logmult;
my $callsign;
my $callname;
my $callmult;

# IMPORTANT NOTE:
# Somehow All TEXT data needs converting to UPPER CASE before Hash entry

open $Q_list, '<', 'listQ.txt' or die "Cannot open listQ.txt: $!";
while (my $line = <$Q_list>) {
chomp $line;
$line =~ s/\r//g; # removes windows CR characters
$line =~ s/\s+$//; # removes trailing white spaces

if ( $line =~ m/^QSO.\s+([0-9]+).*\s+([\w]{4})\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)/ ) {

if ( $1 ne '' ) {
$band = $1;
if ($band =~ m/^7[\d]+/) {
$band = '40M';
} elsif
( $band =~ m/^3[\d]+/ ) {
$band = '80M';
}
}

# Not used in processing at this time (PUN!) but resolved Error messages
$logtime = $2;

# Print Hash data (to verify testHash entries only)

print $band;
$logcall = $3;
print $logcall;
$logname = $4;
print $logname;
$logmult = $5;
print $logmult;
$callsign = $6;
print $callsign;
$callname = $7;
print $callname;
$callmult = $8;
print $callmult;
print "\n";
}
}


# -------------------------------------------
=begin comment

Here are the Hash data entries from the Sample/Test listQ.txt file, which
included Errors (on purpose) in 3 entries for Log Check Error "Testing".
No log was submitted by $callsign W6NV so there are no $logcall entries
in the Master Consolidated listQ.txt file.

40MW7WHYTomORN6ZFOBILLCA
40MW7WHYTomORW9REMIKEIN
80MW7WHYTomORN6ZFBILLCA <- (Callsign should be N6ZFO)
80MW7WHYTomORW6NVOLICA
80MW7WHYTomORW9REMIKEIN
40MW9REMIKEINW7WHYTOMOr
40MW9REMIKEINN6ZFOBILLCa
80MW9REMIKEINN6ZFOBILCa <- (Name should be BILL)
80MW9REMIKEINW7WHYTOMOr
80MW9REMIKEINW6NVOLICa
40MN6ZFOBILLCAW7WHYTOMOR
40MN6ZFOBILLCAW9RRMIKEIN <- (Callsign should be W9RE)
40MN6ZFOBILLCAN2NLDAVEFL
80MN6ZFOBILLCAW9REMIKEIN
80MN6ZFOBILLCAW7WHYTOMOR
80MN6ZFOBILLCAW6NVOLICA

The Log Checking Summary objectives are to Append to the following .csv files
which will be Imported into an Excel spreadsheet for further processing.

1. logscores.csv

A. File Header: LOGCALL,QSOS,MULT,NAME
B. (DATA) QSOs is a "Count" of Verified 2-Way Logged QSOs (Contacts) WITHOUT Errors
only got submitted logs checked.

2. logerrors.csv

A. File Header: LOGCALL,CALLSIGN,ERRORTYPES

B: "CALLSIGN" is the station "claimed" or reported in a submitter's (LOGCALL), but
was some kind of Error(s) due for one or more of the following reasons:

C. ERRORTYPES <- with the Concatenated data separated by one space
NIL (CALLSIGN) - LOGCALL did NOT appear in the log of the CALLSIGN worked or mistyped
NIL (BANDQSO) - The CALLSIGN station log did not show a QSO/Contact on the indicated band
INVALID (NAME) - NAME did not exactly match the (CALLSIGN) NAME in the Submitter's (LOGCALL) log
INVALID (MULT) - MULT did not exactly match the (CALLSIGN) MULT in the Submitter's (LOGCALL) log
DUPE (BANDQSO) - A "Same Band" Duplicate QSO (not allowed)

=end comment

=cut
# -------------------------------------------

# END OF SCRIPT IN PROGRESS


Test listQ.txt data lines (including intentional Errors):


Code
QSO:  7040 CW 2015-01-22 0200 W7WHY           Tom        OR  N6ZFO           BILL       CA 
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Any assistance would be most gratefully appreciated.

Thank you!

-Stuckinarut


FishMonger
Veteran / Moderator

Feb 23, 2015, 9:03 AM


Views: 16823
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

I don't have time right now to work up a full solution, but as a starting point, I'd drop that regex and instead use a simple split statement to extract your fields.

Example:

Code
while (my $line = <$Q_list>) { 
chomp $line;
if ( $line =~ m/^QSO.\s+([0-9]+).*\s+([\w]{4})\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)/ ) {
print Dumper($1,$2,$3,$4,$5,$6,$7,$8);

my @fields = (split(/\s+/, $line))[1,4..10];
print Dumper \@fields;
last;
}
}


Outputs:

Code
$VAR1 = '7040'; 
$VAR2 = '0200';
$VAR3 = 'W7WHY';
$VAR4 = 'Tom';
$VAR5 = 'OR';
$VAR6 = 'N6ZFO';
$VAR7 = 'BILL';
$VAR8 = 'CA';
$VAR1 = [
'7040',
'0200',
'W7WHY',
'Tom',
'OR',
'N6ZFO',
'BILL',
'CA'
];



(This post was edited by FishMonger on Feb 23, 2015, 9:04 AM)


stuckinarut
User

Feb 23, 2015, 11:21 PM


Views: 16796
Re: [FishMonger] HASH-O-RAMA Data Processing Problem

Finally back online briefly - thanks for your reply & info, FishMonger.

I used the lonnnnng REGEXP to eliminate ending up with the Mode (CW), Date & Time fields getting plugged into the mix of things since those are not used in this particular log checking/validation process.

Still trying to figure things out - I feel like a 'deer-in-the-headlights' although a new thought did come to mind.

Starting with the first record, then trolling through ALL of the remaining record entries in the Hash, IF that (logcall) $VAR in the record matches the the (logcall) $VAR in the next record, nothing would NOT be matched or process. This is because it would be *that* person's own log entry, so the process would move onward until a DIFFERENT (logcall) $VAR was reached. It is the (callsign) $VAR (the 2nd 'callsign' in a record that needs matching along with the other noted fields to that specific (callsign) IF it is a (logcall) $VAR in any subsequent records.

Ohhhh...sorry... I'm really having trouble trying to explain things. Hmmm. I'll try this.

In what you kindly posted, $VAR3 (W7WHY) is considered the (logcall). $VAR6 (N6ZFO) is the (callsign). ONLY if N6ZFO is in a subsequent record as $VAR3 will any matching/log check processing take place for that record...and so on down the line for all the (callsign/$VAR6) entries in the W7WHY/$VAR3 records based upon the established log-checking & error criteria.

So, IF $VAR3 in a record matches $VAR6 in the next record, then the other $VAR checks/matches take place. HOWEVER, I need to retain some form of this to change *any* 'frequencies' to simply either the 40M or 80M band designators:

if ( $1 ne '' ) {
$band = $1;
if ($band =~ m/^7[\d]+/) {
$band = '40M';
} elsif
( $band =~ m/^3[\d]+/ ) {
$band = '80M';
}
}

I could be wrong, but I'm thinking after each record log-check match (or no match) takes place on the $VARS that the results data would be appended/written to the .csv files before moving on to the next record.

How to pull it all together remains a Mystery and my head is spinning again ;-(

-Stuckinarut


Chris Charley
User

Feb 24, 2015, 11:48 AM


Views: 16771
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Just a suggestion but couldn't you get a logger program (for free or at small cost) that is available?


stuckinarut
User

Feb 24, 2015, 2:24 PM


Views: 16759
Re: [Chris Charley] HASH-O-RAMA Data Processing Problem

Hi, Chris:

We all use 'logger' programs. It is the special 'Cabrillo' (format) file output from everyone's loggers that get submitted which the Perl REGEXP parses into the data needed for the 'log-checking' nightmare part of things. In other words, 'validating' the logged (and typed) contact information. From that point, I'm still 'Stuck-In-A-Rut' {SIGH}.

-Stuckinarut


(This post was edited by stuckinarut on Feb 24, 2015, 2:25 PM)


Zhris
Enthusiast

Feb 24, 2015, 8:48 PM


Views: 16736
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

I have had a look through your requirements and done a little research on contesting. At this stage I wanted to ask how you propose on deciphering which data is invalid if for example they mistyped their logcall. Do you have a log of all participants containing their logcall, logname and logmult to check against?

Chris


stuckinarut
User

Feb 25, 2015, 12:36 AM


Views: 16728
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Zhris:

In doing the tedious manual processing, I used to give everyone at least one 'Mulligan' for what we term a 'Busted Call' or for a misssspellllud 'Name' or 'Mult' (location). Now the focus includes something called 'Accuracy' :^)

So if the (logcall field) station works a (callsign field) station, when checking the latter's callsign in his or her log, IF the (logcall) station's log being validated yields either a mistyped ('Busted') callsign, that contact will not count.

Dealing with any missing (but legit) callsigns accurately typed but for stations who did NOT submit a log, well, that's where I still have to cut a bit of slack and also do additional 'Manual' processing to include checking against a list of 'Unique' callsigns worked/contacted from ALL log submitters. Normally these few non-log submitter callsigns do show up in multiple logs and are legit.

I didn't want to further complicate the basic log-checking needs of a single Perl script which is what will eliminate the majority of the many hours of manual labor.

A new thought came to mind to try and chart out some Pseuo-Code for what I now think *might* work in a flow of processing.

-Stuckinarut


Zhris
Enthusiast

Feb 25, 2015, 9:03 AM


Views: 16702
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

I have thrown together a rough script which constructs an appropriate data structure then extends it while cross checking.

Please note the following:
- It is not a complete solution, but potentially the base of one.
- I have assumed that the logcall, logname and logmult are always valid since they are the operators own details, in actual fact it is the first seen entries logname and logmult that is checked against throughout. This assumption makes it alot easier when checking for invalid callsigns, callnames and callmults.
- Upon reading your comments, I wasn't entirely clear on each error, particularly the NIL ones, therefore they may not be correct.
- For now / for simplicity, only one possible error is marked against each entry.
- For now, in order to run the script standalone, I have put your input data in the DATA block at the end of the script and dumped the resultant data structure to stdout instead of writing to the relevant output CSVs.
- I have included a potential non submitters hash. The idea is everytime an invalid callsign is discovered, it increments it in the hash by 1. Those with the highest count at the end are more likely to be non submitters as oppose to invalid.

Apologies if it doesn't work quite as expected, I was unable to put much time in it for now, but I'm sure you will be able to describe any issues.


Code
use strict; 
use warnings;
use Data::Dumper;

my $configuration =
{
casesensitive => 0,
input_path => 'listQ.txt',
output_scores_path => 'logscores.csv',
output_errors_path => 'logerrors.csv',
};

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $contestants =
{

};

my $potential_non_submitters =
{

};

while ( my $line = <DATA> )
{
$line =~ s/\s+$//;

my ( $freq, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { uc $_ unless $configuration->{casesensitive} }
(split( ' ', $line ))[1,5..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
callsign => $callsign,
callname => $callname,
callmult => $callmult,
};
}

while ( my ( $logcall, $contestant ) = each ( %$contestants ) )
{
while ( my ( $band, $entries ) = each ( %{$contestant->{bands}} ) )
{
for my $entry ( @$entries )
{
# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# validate entry.
do { $entry->{errors}->{'NIL (BANDQSO)'} = 1; $potential_non_submitters->{$entry->{callsign}}++; next } unless ( defined $contestants->{$entry->{callsign}} ); # invalid callsign.
do { $entry->{errors}->{'DUPE (BANDQSO)'} = 1; next } if ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ); # duplicate entry.
do { $entry->{errors}->{'INVALID (NAME)'} = 1; next } unless ( $entry->{callname} eq $contestants->{$entry->{callsign}}->{logname} ); # invalid callname.
do { $entry->{errors}->{'INVALID (MULT)'} = 1; next } unless ( $entry->{callmult} eq $contestants->{$entry->{callsign}}->{logmult} ); # invalid callmult.
do { $entry->{errors}->{'NIL (CALLSIGN)'} = 1; next } unless ( grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ); # no return entry.
}
}
}

print Dumper $contestants, $potential_non_submitters;

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
$VAR1 = { 
'N6ZFO' => {
'logname' => 'BILL',
'logmult' => 'CA',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'IN',
'callsign' => 'W9RE',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {
'NIL (CALLSIGN)' => 1
},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W9RR',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'FL',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'N2NL',
'callname' => 'DAVE'
}
]
}
},
'W7WHY' => {
'logname' => 'TOM',
'logmult' => 'OR',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'N6ZF',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
},
{
'seen' => 1,
'callmult' => 'IN',
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'CA',
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE (BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE (BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'IN',
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
]
}
},
'W9RE' => {
'logname' => 'MIKE',
'logmult' => 'IN',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'INVALID (NAME)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BIL'
},
{
'seen' => 1,
'callmult' => 'OR',
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (CALLSIGN)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
}
]
}
}
};
$VAR2 = {
'N6ZF' => 1,
'W9RR' => 1,
'N2NL' => 1,
'W6NV' => 3
};


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 9:24 AM)


stuckinarut
User

Feb 25, 2015, 10:24 AM


Views: 16681
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Zhris:

WOW... this looks promising and I appreciate your help! Must leave for a good part of the day, but will jump back into the code later today/tonight and take a 'Test Drive'.

Thanks much!

-Stuckinarut


Zhris
Enthusiast

Feb 25, 2015, 11:36 AM


Views: 16674
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

I've had a little more time to work on this. I have adjusted the script to make it a little more readable and have roughly generated the desired score and error outputs ( strings for now ). I haven't "fixed" anything I invisage being incorrect, instead am awaiting the feedback from your test drive ;):


Code
use strict; 
use warnings;
use Data::Dumper;

local $, = "\t";
local $\ = "\n";

#####

# fh begin.

my $output_scores_str = '';
my $output_errors_str = '';

my $input_fh = \*DATA; # 'listQ.txt'
open my $output_scores_fh, '>', \$output_scores_str; # 'logscores.csv'
open my $output_errors_fh, '>', \$output_errors_str; # 'logerrors.csv'

#####

# init.

my $case_sensitive = 0;

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $contestants = { };

my $potential_non_submitters = { };

#####

# first sweep ( load input data into hash ).

while ( my $line = <$input_fh> )
{
$line =~ s/\s+$//;

my ( $freq, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { uc $_ unless $case_sensitive }
(split( ' ', $line ))[1,5..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
callsign => $callsign,
callname => $callname,
callmult => $callmult,
};
}

#####

# second sweep ( process hash, generate logs ).

print $output_scores_fh 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $output_errors_fh 'LOGCALL', 'CALLSIGN', 'ERRORTYPES';

while ( my ( $logcall, $contestant ) = each ( %$contestants ) )
{
my $verified = 0;

while ( my ( $band, $entries ) = each ( %{$contestant->{bands}} ) )
{
for my $entry ( @$entries )
{
# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# validate entry.
if ( not defined $contestants->{$entry->{callsign}} ) # invalid callsign.
{
$entry->{errors}->{'NIL(BANDQSO)'} = 1;

$potential_non_submitters->{$entry->{callsign}}++;
}
elsif ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ) # duplicate entry.
{
$entry->{errors}->{'DUPE(BANDQSO)'} = 1;
}
elsif ( $entry->{callname} ne $contestants->{$entry->{callsign}}->{logname} ) # invalid callname.
{
$entry->{errors}->{'INVALID(NAME)'} = 1;
}
elsif ( $entry->{callmult} ne $contestants->{$entry->{callsign}}->{logmult} ) # invalid callmult.
{
$entry->{errors}->{'INVALID(MULT)'} = 1;
}
elsif ( not grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ) # no return entry.
{
$entry->{errors}->{'NIL(CALLSIGN)'} = 1;
}

# log errors if any, or increment verified count.
if ( keys %{$entry->{errors}} )
{
print $output_errors_fh $logcall, $entry->{callsign}, keys %{$entry->{errors}}; # todo: errors better as list not hash.
}
else
{
$verified++;
}
}
}

# log score.
print $output_scores_fh $logcall, $verified, $contestant->{logmult}, $contestant->{logname};
}

#####

# dump.

{
local $, = "\n";
print Dumper $contestants, $potential_non_submitters;
}

#####

# fh end.

close $input_fh;
close $output_scores_fh;
close $output_errors_fh;

print $output_scores_str;
print $output_errors_str;

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
$VAR1 = { 
'N6ZFO' => {
'logname' => 'BILL',
'logmult' => 'CA',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {},
'callsign' => 'W9RE',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {
'NIL(CALLSIGN)' => 1
},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W9RR',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'FL',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'N2NL',
'callname' => 'DAVE'
}
]
}
},
'W7WHY' => {
'logname' => 'TOM',
'logmult' => 'OR',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'N6ZF',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {},
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE(BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE(BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {},
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
]
}
},
'W9RE' => {
'logname' => 'MIKE',
'logmult' => 'IN',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'INVALID(NAME)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BIL'
},
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(CALLSIGN)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
}
]
}
}
};

$VAR2 = {
'N6ZF' => 1,
'W9RR' => 1,
'N2NL' => 1,
'W6NV' => 3
};

LOGCALL QSOS MULT NAME
N6ZFO 2 CA BILL
W7WHY 3 OR TOM
W9RE 2 IN MIKE

LOGCALL CALLSIGN ERRORTYPES
N6ZFO W7WHY NIL(CALLSIGN)
N6ZFO W6NV NIL(BANDQSO)
N6ZFO W9RR NIL(BANDQSO)
N6ZFO N2NL NIL(BANDQSO)
W7WHY N6ZF NIL(BANDQSO)
W7WHY W6NV NIL(BANDQSO)
W7WHY N6ZFO DUPE(BANDQSO)
W7WHY N6ZFO DUPE(BANDQSO)
W9RE N6ZFO INVALID(NAME)
W9RE W6NV NIL(BANDQSO)
W9RE N6ZFO NIL(CALLSIGN)


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 11:58 AM)


stuckinarut
User

Feb 25, 2015, 7:23 PM


Views: 16643
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Zhris (Chris):

AWESOME... I think this is pretty close to working!

I realized that I forgot to make an Error in spelling for one of the mults, and noticed the Name & Mult were missing for W6NV in one of the __DATA__ lines, so re-tweaked and ran the script again. The Mult Error checking worked!


Code
#####  

# STUCKINARUT CHANGED W9RE MULT TO "IL" FOR W7WHY 40M QSO AT 0201 TO TEST MULT SPELLING CHECK :^)
# STUCKINARUT ADDED NAME=OLI AND MULT=CA FOR W6NV 80M QSO WITH N6ZFO AT 0235 FOR COMPLETE LOG ENTRY

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


I printed out the Summaries and started manually checking each (DATA) log entry to verify the actual LOGCALL QSOs and go through the ERRORS list. Partway through I realized adding 2 more fields to the ERRORS list would be of great help in this process to check things against the (Master) __DATA__ list. And especially since the total number of actual QSOs to be checked will be about 2,500 and a likely high number of Error items to Manually check/research after-the-fact.

For the Errors list if you could tweak the fields to be as follows that would be very, very helpful:

LOGCALL CALLSIGN BAND TIME ERRORTYPES

So actually the TIME field will indeed play an important part after all {SIGH} and help go line by line to verify and validate the 'Test' data. If all is well, I'll add more __DATA__ records including intentional Errors and run more tests.

Also writing the $VAR2 list with just the (Callsigns) to a .txt file would be helpful for the manual checking process. Sorting the first (Logcall) field in Alpha-Numeric for all 3 files would I see now also facilitate faster checking and save time.

From a quick check in the Master EXCEL file of logs & data submitted, the Total Unique 'Logcalls' (submitted log callsigns) is about 32, but another 26 'Callsigns' actually worked in the event (but they did not submit logs). Some people just 'show up for the fun' but are not interested in doing paperwork :^)

Looking toward the future of hopefully 70 logs and maybe 4,000 contacts/log entries in this one-hour event, hopefully the processing approach here will handle that amount of data?

My original plan to roll through the submitted log entries one by one overlooked the fact that log entries further down the food chain would also have to check every entry starting from the first forward. DUH! (on me).

Thanks very much for your help, Chris !!!

-Stuckinarut


Zhris
Enthusiast

Feb 25, 2015, 10:03 PM


Views: 16633
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Your feedback has been valuable in helping me understand your thoughts. It would be cool to develop this as much as possible in order to eliminate as much manual checking as possible. Inevitably, if you had a list of contestants, the entire process could probably be automated.

As you have come to realize, in order to be able to check an entry against any other possible entry, we had to load all the data into a hash during the "first phase". The amount of memory this hash uses shouldn't be of concern unless it contained millions of entries. If it ever became a problem, there are tweaks that could be made to improve memory consumption.

Make certain that you try to cover every possible scenario in your sample input data, just in case we have missed something!

Also, if theres anything you don't understand from reading the code, feel free to raise your concerns, it may be necessary for you to one day make adjustments.

As per your post I have made the following changes:
- Replaced the data block with the adjusted data block you provided.
- Implemented sorting in numerous places.
- Added band and time fields to error log.
- Generated potential non submitters log, including weight field as this is important.

As per my own initiative, I have also made the following changes:
- Implemented an ignore hash. Any callsigns in this hash will be ignored. Once you have deciphered which logcalls belong to non submitters, you could insert them into this hash, then rerun the script.
- Fixed case sensitive map bug.

Perhaps still to do:
- If the ignore hash is handy, this could be constructed from another input file.
- At this time, only one possible error is logged against each entry in precedence as per the if / elsif conditions. You may wish to allow multiple possible errors, although there are discrepencies in doing so.
- Instead of an error log, you may wish to incoporate errors / mulligans into the final score ( QSOS ) for each logcall.
- Minor improvements to code, i.e. replace greps with List::*Util functions etc.
- Replace development filehandles with those pointing to your input and output logs.


Code
use strict; 
use warnings;
use Data::Dumper;

#local $/ = "\r\n";
local $, = "\t";
local $\ = "\n";

#####

# init.

my $case_sensitive = 0;

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $ignore =
{
#W6NV => 1,
};

my $contestants = { };

my $potential_non_submitters = { };

#####

# fh open.

my $output_scores_str = '';
my $output_errors_str = '';
my $output_nonsub_str = '';

my $input_fh = \*DATA; # open my $input_fh, '<', 'listQ.txt' or die "cannot open 'listQ.txt': $!";
open my $output_scores_fh, '>', \$output_scores_str; # open my $output_scores_fh, '>', 'logscores.csv' or die "cannot open 'logscores.csv': $!";
open my $output_errors_fh, '>', \$output_errors_str; # open my $output_errors_fh, '>', 'logerrors.csv' or die "cannot open 'logerrors.csv': $!";
open my $output_nonsub_fh, '>', \$output_nonsub_str; # open my $output_nonsub_fh, '>', 'lognonsub.csv' or die "cannot open 'lognonsub.csv': $!";

# headings.
print $output_scores_fh 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $output_errors_fh 'LOGCALL', 'CALLSIGN', 'BAND', 'TIME', 'ERRORTYPES';
print $output_nonsub_fh 'LOGCALL', 'WEIGHT';

#####

# first phase ( load input data into hash ).

while ( my $line = <$input_fh> )
{
# remove any whitespace on the end of the line ( spaces, carriage return, newline ).
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
bands => { },
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
time => $time,
callsign => $callsign,
callname => $callname,
callmult => $callmult,
seen => 0,
errors => [ ],
};
}

#####

# second phase ( process hash, generate logs ).

for my $logcall ( sort keys %$contestants )
{
my $contestant = $contestants->{$logcall};

# instead of verified counter, could count number of error free entries before logging.
my $verified = 0;

for my $band ( sort keys %{$contestant->{bands}} )
{
my $entries = $contestant->{bands}->{$band};

for my $entry ( sort { $a->{callsign} cmp $b->{callsign} } @$entries )
{
# skip if in ignore list.
next if exists $ignore->{$entry->{callsign}};

# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# verify entry.
if ( not defined $contestants->{$entry->{callsign}} ) # invalid callsign.
{
push @{$entry->{errors}}, 'NIL(BANDQSO)';

$potential_non_submitters->{$entry->{callsign}}->{$logcall}++;
}
elsif ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ) # duplicate entry.
{
push @{$entry->{errors}}, 'DUPE(BANDQSO)';
}
elsif ( $entry->{callname} ne $contestants->{$entry->{callsign}}->{logname} ) # invalid callname.
{
push @{$entry->{errors}}, 'INVALID(NAME)';
}
elsif ( $entry->{callmult} ne $contestants->{$entry->{callsign}}->{logmult} ) # invalid callmult.
{
push @{$entry->{errors}}, 'INVALID(MULT)';
}
elsif ( not grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ) # no return entry.
{
push @{$entry->{errors}}, 'NIL(CALLSIGN)';

# todo: set "yes return entry" in relevant entry in $contestants->{$entry->{callsign}}->{bands}->{$band}, then don't have to pointlessly reverse check later.
}

# log errors if any, or increment verified count.
if ( @{$entry->{errors}} )
{
print $output_errors_fh $logcall, $entry->{callsign}, $band, $entry->{time}, @{$entry->{errors}};
}
else
{
$verified++;
}
}
}

# log score.
print $output_scores_fh $logcall, $verified, $contestant->{logmult}, $contestant->{logname};
}

#####

# third phase ( process potential non submitters hash, generate potential non submitters log ).

# reformat potential non submitters hash into callsign => count / weight. We incorporated contestants own logcall to ensure they can't skew
# the result if they consistantly use an invalid callsign via different bands or duplicate entries i.e. equal weighting / one logcall per callsign reported.
$_ = keys %$_ for ( values %$potential_non_submitters );

for my $callsign ( sort { $potential_non_submitters->{$b} <=> $potential_non_submitters->{$a} } keys %$potential_non_submitters )
{
my $weight = $potential_non_submitters->{$callsign};

# log potential non submitter.
print $output_nonsub_fh $callsign, $weight;
}

#####

# fh close.

close $input_fh;
close $output_scores_fh;
close $output_errors_fh;
close $output_nonsub_fh;

# print
#print Dumper $contestants;
#print Dumper $potential_non_submitters;
print $output_scores_str;
print $output_errors_str;
print $output_nonsub_str;

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
LOGCALL	QSOS	MULT	NAME 
N6ZFO 2 CA BILL
W7WHY 2 OR TOM
W9RE 2 IN MIKE

LOGCALL CALLSIGN BAND TIME ERRORTYPES
N6ZFO N2NL 40M 0222 NIL(BANDQSO)
N6ZFO W9RR 40M 0221 NIL(BANDQSO)
N6ZFO W6NV 80M 0235 NIL(BANDQSO)
N6ZFO W7WHY 80M 0231 NIL(CALLSIGN)
W7WHY N6ZFO 40M 0200 DUPE(BANDQSO)
W7WHY N6ZFO 40M 0200 DUPE(BANDQSO)
W7WHY W9RE 40M 0201 INVALID(MULT)
W7WHY N6ZF 80M 0231 NIL(BANDQSO)
W7WHY W6NV 80M 0232 NIL(BANDQSO)
W9RE N6ZFO 40M 0221 NIL(CALLSIGN)
W9RE N6ZFO 80M 0231 INVALID(NAME)
W9RE W6NV 80M 0249 NIL(BANDQSO)

LOGCALL WEIGHT
W6NV 3
N6ZF 1
W9RR 1
N2NL 1


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 10:47 PM)


stuckinarut
User

Feb 25, 2015, 10:59 PM


Views: 16612
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris:

I am in AWE of what you have come up with so quickly. Indeed another humbling experience to realize how much I have yet to learn (or even attempt to).

I've printed the revised output to whack away at. Before my previous post, I was actually chewing on an idea similar to your 'ignore hash' method to (hopefully) eliminate some of the further tedious manual work. I can't thank you enough for your interest in willing to also do some possible 'embellishments'. I'm thinking you must be related to 'Santa Claus' ???

As I chomp through the latest output checks, I'll jot down any thoughts about additional 'automation' possibilities and come back in a day or two.

In the interim, here's a bit more information about the particular event (a short on-air 'Contest' - a/k/a 'RadioSport' event, in the form of a 'QSO Party'). Most of the log submitters are 'Regulars' in what we call 'Thursday Night Contesting' ... a real rip-snortin' high-speed Morse Code venue a la 35 to 40WPM (words-per-minute) in speed. I mean, we are talking 'Lightning Fast' stuff here. The real crazy thing is that most of us are real 'old men' --- mostly 60 and above (I just turned 71).

My little event started as a fun on-air way to celebrate my 50th Anniversary Year as a licensed 'Ham' operator. And, as a way to 'give back' to this group of highly skilled CW (Morse Code) RadioSport guys - a 'niche' part of Ham Radio, I give subscriptions and/or renewals to the main RadioSport magazine as Awards for the 10 different entry categories. These are mostly by Geographic area, however there is a separate category for 'NOOBS' (1st-timers to either my Shindig or the Thursday Night Contesting events). A 'Green Power' category is for those running their rigs strictly off battery, solar and/or wind power.

BUT WAIT...THERE'S MORE...

To try and 'level-the-playing-field' and give all the 'Little Pistol' station folks an opportunity to compete based upon 'SKILL' and not by size and power/antennas of stations, those with BIG antennas must reduce their output power based on a list of criteria by antenna (gain). It's been pretty amazing to hear some of the Big Gun TNC 'Regulars' show up with much weaker signals (these are Honest guys). The 'Rules' are kinda complicated. Everyone probably deserves some kind of award for just reading them :^)

Before first posting my outreach for assistance, I had several added 'automation' desires in the mix, but decided to try and simplify things to the bulk of the manual labor part so didn't include them. Your kind willingness to explore some enhancements is muchly appreciated. I've been trying to figure out a way to use Perl to help with this annual 'Nightmare' task, but just could not pull it together myself {OBVIOUSLY}. Although the on-air event is a real blast, I always DREAD the after-the-party manual log checking of several thousand contacts ;-(

Your help here so far has been like a 10,000 pound Gorilla being lifted off my back (and head) !!!

For now I'll leave you with one enhancement that would definitely save some more time and frustration.

FYI, each valid QSO (Contact) is worth 1,000 Points. In the 2nd year, to help encourage folks to contact EVERYONE possible, I introduced 3 (Volunteer/Secret) stations that would yield an additional 5,000 'Bonus Points' if contacted (well, make the 2 ... everyone also gets 5,000 BP's for working me :^) We all get to run higher power in order to give the best chance at success. Nobody knows who the other 2 are until the even starts - they and myself use one of my 'nicknames' in our exchanges. But here's the catch... even if all 3 of us are worked on both bands (40M & 80M), the Bonus Points only apply for ONE QSO (Contact) for each of us. In other words, the MAXIMUM number of Bonus Points is 15,000.

My intial thought was to have a separate list to input into the mix with the 3 callsigns, but then things got foggy as I tried to figure out how to credit the Bonus Points only ONCE for the 3 of us. Since 2 of the Bonus Station callsigns are different every year, being able to use a simple .txt file with the 3 callsigns (all of which have log entries submitted in the Master list), and also automate this additional manual labor step would be extremely helpful. The final step in 'after-the-party-is-over' work here is a final recap write-up of the event and a listing of all log submitter scores and category winners. I do a *LOT* of copy-and-pasting {SIGH}.

OK... sorry for the 'Novel' here, but hopefully additional insights into the 'WHY' of the 'HASH-O-RAMA' plea for help.

Will return in a day or two.

Thanks again, Chris !!!

-Stuckinarut


stuckinarut
User

Feb 25, 2015, 11:17 PM


Views: 16609
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Oh, Chris, I forgot to add a thanks for your 'Brilliant' initiative:

> Generated potential non submitters log, including weight field as this is important.

This will be EXTREMELY useful in distinguishing at-a-glance any single 'One-Off' busted callsigns vs. the Non-Log-Submitter callsigns who will likely show up in multiple logs & QSO lines. Those I will need to add back in to individual scores as 'Valid' (Mulligan-like) Contacts. The 'Honor System' factors into things in this area.

-Stuckinarut


stuckinarut
User

Feb 26, 2015, 8:08 AM


Views: 16579
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hey, Chris...

RE:
> It would be cool to develop this as much as possible in order to eliminate as much manual checking as possible

After about one hour of sleep, I woke up with a bunch of thoughts racing through my mind for further automation & manual labor reduction. But I was too groggy to type on a keyboard so took

a small digital recorder back to bed with me to dictate into. I'd turn it off and try to go back to sleep, and then a few minutes later grab it and start recording another thought. It's a

good thing I'm presently 'in-between wives & dogs', or I would have been kicked out of the sack and sent to a real 'Dog House'. I mean, who wants some guy babbling into a recorder in the

dark next to you at 2AM when you're trying to sleep? {GRIN}.

Now I'm up just starting to transcribe the info. Cudda-Shudda installed the copy of DRAGON several months ago when I bought it to partially 'Automate' this particular process ;-(

BTW, thanks for clarifying that a lot of stuff can go into the Hash. Clever that you inserted several same-band 'DUPE' records into the test __DATA__ to check that very important function.

Nice indeed!

RE:
>Inevitably, if you had a list of contestants, the entire process could probably be automated.

Unfortunately, unless everyone who planned to participate would 'Register' in advance, there is really no way to have a simple 100% accurate list. Actually, formal log submissions were a

bit down this year due to time-schedule conflicts. 4 or 5 of the Annual 'Regular' participants sent me emails in advance apologizing that they would have to miss the event. That was nice

of them.

So here was a previous thought I had about this matter. Now that I've learned you can 'sweep' and 're-sweep' the Hash:

1. Once the Master Log data is read into the Hash, the next step would be to:

A. Do a 'sweep' of ALL the multiple (logcall field) callsigns and produce a cleaned list of only the 'Uniques'.

B. Do another 'sweep' of ALL the multiple (callsign field) callsigns and produce a cleaned list of only the 'Uniques'.

C. Compare the two lists and if my logic is correct, this could yield (similar to subtracting List A *from* list B), a 'Total Uniques' list of not only just ALL the actual total on-air

participants worked who did NOT submit logs, but also include any/all 'Busted' Callsigns/Errors). This will actually serve a valuable purpose along with your Brilliant idea to include a

'Weight' factor.

Hmmm...in retrospect it might have been less confusing to designate the two QSO line 'callsign' fields as 'logcall' and 'callworked' ?

Anyway, IF this new list of both "One-Off" Errors *and* Non-Log Submitted valid/invalid'callsign' Uniques also includes the BAND, TIME, NAME & MULT data, some additional log-checking

'automation' benefits can be realized subject to a possible 'Mulligan' factor in the mix.

Assuming that any callsign entry on this list that appears at least 2 or maybe 3 times would NOT be a "One-Off" Error, but rather an actual legitimate callsign/station QSO (contact), this

could further reduce the manual labor process. What would really be EL SLICK-O, is to be able to change what I'll term a 'threshold' level or value of the 'Weight' for its use in log

checking tests. I'll try and explain.

IF one of these callsigns has a weight of 2 (occurences in the overal scheme of things), then any (logcall) QSO claims for the (callworked) would process the same as if it had been one of

the formal (logcall) submission QSO data lines. Adding the BAND, NAME & MULT data would make this possible. The TIME field would be for use in 'Manual' check/reconcilations. Only those

with a weight of 1 would end up on the Error list. Being able to switch the 'threshold' use value between 1, 2 and 3 would be very valuable in 'Beta Testing' the output! A value of 3 would

likely be the most accurate, but if 2 works and reduces the list of Errors to manually check, that would be like 'BINGO' :^) Not sure if I've explained this well.

NOTE: The TIME field is NOT used in actual log checking, because not everyone's Confuzzzer clocks have been set to the same precise WWV time. There could be a range of + or - several

minutes, but since the event operating time is structured for 30 minutes on the 40M Band first, then 30 minutes on 80M, TIME is really a Moot issue - except to make the Manual Error

checking stuff go faster when visually looking through a submitted log.

RE:

>LOGCALL WEIGHT
>W6NV 3
>N6ZF 1
>W9RR 1
>N2NL 1

Maybe use 'NOLOG' instead of 'LOGCALL' ?


Ahhhh... here's another idea...

Something like a 'Control Panel' or 'CONFIG OPTIONS' section near the top of the script to quickly change any option values or turn stuff ON or OFF (i.e., = true {or} = false) whether

involving either the Hash data or other .txt file data to be read in or not. Here's what came to mind for this item:

# ############################################
# ############################################
# ############## CONFIG OPTIONS ##############

# Threshold for Uniques list in log-checking
# (NO Log Submitted -or- Busted Callsigns)
# Values: 1, 2 or 3 (occurences) on the list

nologthreshold = 2

# Next CONFIG OPTION - Blah Blah Blah

etc., etc.

# ############################################

OK, this is pretty lengthy so I'll post the further 'Automation' thoughts separately later after transcribing everything.

Thanks!

- Stuckinarut


stuckinarut
User

Feb 26, 2015, 8:22 AM


Views: 16578
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

OOOPS... sorry...

RE:


Quote
# Threshold for Uniques list in log-checking
# (NO Log Submitted -or- Busted Callsigns)
# Values: 1, 2 or 3 (occurences) on the list

nologthreshold = 2

A value of 1 would be ONLY a firm/fixed 1 occurrence.

A value of 2 would be at least 2 (or more) occurrences.

A value of 3 would be at least 3 (or more) occurrences.


What do you think ???

Thanks!

-Stuckinarut


(This post was edited by stuckinarut on Feb 26, 2015, 8:34 AM)


Zhris
Enthusiast

Feb 26, 2015, 8:59 AM


Views: 16569
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Just letting you know that I'm out all day today and will look through this when I get back later.

Best regards,

Chris


stuckinarut
User

Feb 26, 2015, 10:31 AM


Views: 16560
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hey, Chris...

>Just letting you know that I'm out all day today and will look through this when I get back later.

Thanks - I'll be gone part of today myself. In the interim, I'll post more info for 'Automation' mix of things.

I chugged down more Caffeine and think I can explain this 'nologthreshold' function a bit better (as I now see it). It is a bit more tricky than I thought and needs some tweaking.

RE: (I changed the first field header :^)


Code
NOLOG	WEIGHT  
W6NV 3
N6ZF 1
W9RR 1
N2NL 1


I'll use just W6NV in different scenario examples and 'occurrences'(the weight #) which would be need to be based on not just the (callsign), but the IDENTICAL NAME & MULT data. So I think these two fields

would need to be added to the primary ERROR output as well:


Code
RE: 
LOGCALL CALLSIGN BAND TIME ERRORTYPES
N6ZFO W6NV 80M 0235 NIL(BANDQSO)
W7WHY W6NV 80M 0232 NIL(BANDQSO)
W9RE W6NV 80M 0249 NIL(BANDQSO)


I'm now thinking maybe NOLOG might be a better simple ERRORTYPE if one of these (callsigns) can NOT be matched to a (submitted) log.

As things stand now, the weight for W6NV is 3.


Code
NOLOG	WEIGHT 
W6NV 3


If the ERROR output is tweaked to flag a 'NOLOG' (submitted) attempted match with an Error code of just 'NOLOG", we would see now see this:


Code
LOGCALL	CALLSIGN	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 0235 OLI CA NOLOG
W7WHY W6NV 80M 0232 OLI CA NOLOG
W9RE W6NV 80M 0249 OLI CA NOLOG


So 'WHAT IF' there were actual (accuracy) Errors (NAME or MULT) that legitimately should invalidate a QSO with a NOLOG Submitted station and be considered in the 'Weight' for a 'Threshold' level value?

Consider this possibility:


Code
LOGCALL	CALLSIGN	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 0235 OLI CA NOLOG
W7WHY W6NV 80M 0232 OLI CA NOLOG
W9RE W6NV 80M 0249 ORI CA NOLOG


Combining the (callsign) with the NAME & MULT, would yield a different (tweaked) result:


Code
NOLOG	WEIGHT	NAME	MULT 
W6NV 2 OLI CA
W6NV 1 ORI CA


HA!!! If the 'nologthreshold' value were set as 2 (which would be at least 2 'or more' occurrences in the actual log checking', this would *reasonably* suggest that any QSOs with W6NV and OLI CA are most

likely valid and should be counted as Valid QSOs and NOT show up in the Error output (somewhat of a 'Mulligan' approach, but would save considerable manual labor).

The W6NV and ORI CA entry was definitely an error and therefore an invalid QSO. If the 'nologthreshold' value were set as 3 ('or more' occurrences), then all 3 of the W6NV QSOs would continue to show in the

ERROR output and not be (automatically) validated in the log checking.

Another scenario:


Code
LOGCALL	CALLSIGN	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 0235 OXI CA NOLOG
W7WHY W6NV 80M 0232 OLI CA NOLOG
W9RE W6NV 80M 0249 ORI CA NOLOG


HA! Now we would have:


Code
NOLOG	WEIGHT	NAME	MULT 
W6NV 1 OLI CA
W6NV 1 OXI CA
W6NV 1 ORI CA


So if the 'nologthreshold' value is 2 (or more) -or- 3 (or more) occurrences, then ALL of the W6NV's QSOs would bounce to the ERROR output.

A NEW THOUGHT JUST FLOATED IN... to first calculate all 'weights' for the 'nolog' callsigns based on (callsign)+NAME+MULT, to be able to include a 'weight' figure in the ERROR output data:


Code
LOGCALL	CALLSIGN	WEIGHT	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 1 0235 OXI CA NOLOG
W7WHY W6NV 80M 1 0232 OLI CA NOLOG
W9RE W6NV 80M 1 0249 ORI CA NOLOG


For testing purposes at different 'nologthreshold' values, this would give an INSTANT bird's eye (or eagle's eye) view of things as well as 'Mulliganizing' decisions :^)

I'll keep chewing on this more. Chances are at least 10 QSOs will be made by the majority of NOLOG (callsigns). One fast way to get a handle on a possible Accuracy/Error factor for the NAME & MULT fields

would be to do a Count of QSO entries for the list of NOLOG (callsigns) and sort Descending by Quantity of QSOs made... based on including the NAME & MULT fields. Yes... I think that would give a quick picture

of the reality of things.

I need to finishing re-tweaking the Master Logs file because some folks used a logging software module that included QSO Serial Numbers column fields (used in many other RadioSport events). So once this is

completed, maybe it would be helpful to get the actual file to you (somehow) so you will have the complete real nitty gritty to work with?

A looming new question at this point is IF there are 'manual' adjustments made, what creative way might there be to then re-run Summary output to include the changes {HUGE SIGH}.

Thanks, Chris.

-Stuckinarut


Zhris
Enthusiast

Feb 26, 2015, 1:35 PM


Views: 16547
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Firstly, thank you for the background information on amateur radio sports. It helps me to understand the purpose of this task and enables me to provide more valuable suggestions. I have also looked at your website and familiarized myself with the rules ( I look forward to that reward you "promised" just for reading them! ).

When the time comes, we certainly can adjust to code to incooporate bonus points for contacting up to three volunteer secret stations across both bands. It has also become apparent that you have to do alot of pre and post processing i.e. pre processing includes adjusting contestants logs into the Cabrillo format, and post processing includes using excel to decipher the top scorers under each category. Perhaps this could be handled by our Perl script too.

With regards to namespacing, the whole logcall and callsign can be pretty baffling, afterall they are the same thing, but represent whether they are the transmitter or the receiver. I would have thought the namespaces of transmittercall for logcall and receivercall for callsign might be more appropriate, but what do I know.

Like alot of problems, the more you think about them, the more you realise their complexity. Before I read your reply, I had realised that we hadn't incoporated name and mult into the mix of things when attempting to automate the process of deciphering valid but unsubmitted logcalls. One example of many scenarios that must be considered is, if two contestants contacted an unsubmitted callsign, but logged conflicting callnames and/or callmults, we have no idea which is valid. I could also imagine a group of cheaters agreeing to log non existant callsigns in order to increase their weighting. Etc etc etc.

I propose we break the problem up into two phases.

- The first phase produces a log of all valid contestants callsigns, callnames and calllogs, whether they submitted their entries or not. Based on yours and my own ideas this can be automated to some level of accuracy, but an option to manually check and adjust this log is necessary.
----- The automated process will generate the log of all contestants using configurable threshold(s).
----- The manual process will generate a log of confirmed contestants, and unconfirmed contestants with relevant information in order to decide to keep or eliminate by eye. The resultant log will need to be in the same format as the one generated by the automated process.
- Now that we have the log generated in the first phase, the second phase of actually scoring each contestant will be a piece of cake.

This two phase process also simplifies the process and eliminates the need for the ignore and potential non submitters hashes.

I really need to get my head round phase one and re-read your concepts / ideas. For now I have thrown together code which handles confirming easily confirmable contestants, then lays out the unconfirmed contestants in a potentially suitable structure ready to be processed automatically or logged for manual processing. I will try to figure out the automation side of things.


Code
use strict; 
use warnings;
use Data::Dumper;

#local $/ = "\r\n";
local $, = "\t";
local $\ = "\n";

#####

# fh open.

my $string_output_scores = '';
my $string_output_errors = '';

my $handle_input_entries = \*DATA;
open my $handle_output_scores, '>', \$string_output_scores;
open my $handle_output_errors, '>', \$string_output_errors;

# headings.
print $handle_output_scores 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $handle_output_errors 'LOGCALL', 'CALLSIGN', 'BAND', 'TIME', 'ERRORTYPES';

#####

# init.

my $configuration =
{
handle_input_entries => $handle_input_entries,
#handle_input_contestants => $handle_input_contestants,
#handle_output_contestants => $handle_output_contestants,
handle_output_scores => $handle_output_scores,
handle_output_errors => $handle_output_errors,
case_sensitive => 0,
band_lookup => { 3 => '80M', 7 => '40M' },
automate_unconfirmed => 1,
automate_threshold => 2,
};

my $phase_dispatch =
{
1 => \&phase1,
2 => \&phase2,
};

#####

# phase.

my $phase = 1; # $ARGV[0];

$phase_dispatch->{$phase}->( $configuration );

#####

# fh close.

close $handle_input_entries;
close $handle_output_scores;
close $handle_output_errors;

# print.
#print $string_output_scores;
#print $string_output_errors;

#####

# functions.

sub phase1
{
my ( $configuration ) = @_;

my $handle_input_entries = $configuration->{handle_input_entries};
my $case_sensitive = $configuration->{case_sensitive};
#my $band_lookup = $configuration->{band_lookup};
my $automate_unconfirmed = $configuration->{automate_unconfirmed};

my $hash = { };

while ( my $line = <$handle_input_entries> )
{
# ignore blank or comment lines.
next if $line =~ m/^(\s*#|\s*$)/;

# remove any whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# lookup band via frequency.
#my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# log* are automatically confirmed.
$hash->{confirmed}->{$logcall} = { logname => $logname, logmult => $logmult } unless exists $hash->{confirmed}->{$logcall};

# call* are not confirmed yet.
$hash->{unconfirmed}->{$callsign}->{$callname}->{$callmult}++;
}

for my $callsign ( keys %{$hash->{unconfirmed}} )
{
# if the unconfirmed callsign exists as a confirmed logcall, delete this unconfirmed entry as it is now confirmed.
if ( exists $hash->{confirmed}->{$callsign} )
{
delete $hash->{unconfirmed}->{$callsign};
}
elsif ( $automate_unconfirmed )
{
# ...
}
}

print Dumper $hash;

return 1;
}

sub phase2
{

}

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE ON
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKEY IF
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
$VAR1 = { 
'confirmed' => {
'N6ZFO' => {
'logname' => 'BILL',
'logmult' => 'CA'
},
'W7WHY' => {
'logname' => 'TOM',
'logmult' => 'OR'
},
'W9RE' => {
'logname' => 'MIKE',
'logmult' => 'IN'
}
},
'unconfirmed' => {
'N6ZF' => {
'BILL' => {
'CA' => 1
}
},
'W9RR' => {
'MIKE' => {
'ON' => 1,
'IN' => 1
},
'MIKEY' => {
'IF' => 1
}
},
'N2NL' => {
'DAVE' => {
'FL' => 1
}
},
'W6NV' => {
'OLI' => {
'CA' => 3
}
}
}
};


Regards,

Chris


(This post was edited by Zhris on Feb 26, 2015, 1:46 PM)


Zhris
Enthusiast

Feb 27, 2015, 8:25 PM


Views: 16496
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi,

I have worked on this a little this evening and although have done little testing, I have come up with the below program in order to automate generating the list of contestants. There is no automate on or off option because every unique contestant is inserted into the resultant log with a status, being one of 'valid', 'manual' or 'invalid'. The algorithm is a little complex to explain in the little time I have right now, but if you test out different scenarios, I'm sure you will discover. Briefly:

- valid:
----- the operator had logged entries therefore had submitted their log, OR, other operators had logged entries more than a threshold number of times, weighted 1 unit per unique operator.
----- we discovered a single name and mult more frequently by weight from potentially other possiblities.
- manual:
----- the entry is probably valid, BUT, we discovered multiple names and/or mults with equal frequencies by weight, therefore we couldn't decipher which one(s) were correct. This is most likely the result of contestants who didn't submit their logs, but other contestants who called them made errors.
- invalid:
----- the entry wasn't valid, AND, there may be multiple names and/or mults with equal frequencies by weight.

Once this log is generated, you can go through manually and make adjustements as you see fit. Contestants marked 'manual' are basically valid, but you should adjust the name and mult values to a single name and mult (all possiblities are listed seperated by a pipeline), then adjust the status to 'valid', or if you really want 'invalid'. I would assume that when you run through real world data, there will be none to little manuals. You can leave 'invalid' contestants alone, but you may wish to double check them following the same update process you did for 'manual'. Phase two would ignore contestants with any status other than 'valid'.


Code
use strict; 
use warnings;
use List::MoreUtils qw/before/;
use Data::Dumper;

#####

# handle open.

my $strings = [ ];

my $handle_input_entries = _handle( \*DATA );
my $handle_output_contestants = _handle( $strings, '>' );

#####

# init.

my $configuration =
{
handle_input_entries => $handle_input_entries,
handle_output_contestants => $handle_output_contestants,
case_sensitive => 0,
band_lookup => { 3 => '80M', 7 => '40M' },
threshold => 2,
};

my $phases =
[
\&_phase1,
\&_phase2,
];

#####

# phase.

my $phase = 0; # $ARGV[0];

$phases->[$phase]->( $configuration );

#####

# handle close.

close $handle_input_entries;
close $handle_output_contestants;

{
local $, = "\n";

print @$strings;
}

#####

# functions.

# phase one.
sub _phase1
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

local $" = '|';
local $, = "\t";
local $\ = "\n";

my $handle_input_entries = $configuration->{handle_input_entries};
my $handle_output_contestants = $configuration->{handle_output_contestants};
my $case_sensitive = $configuration->{case_sensitive};
my $threshold = $configuration->{threshold};

my $contestants = { };

while ( my $line = <$handle_input_entries> )
{
# ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove any whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $log_sign, $log_name, $log_mult, $call_sign, $call_name, $call_mult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# todo: validate line i.e. ensure each var has val.

# populate contestants hash.
#$contestants->{$log_sign }->{log }->{seen }->{$log_sign}++;
$contestants->{$log_sign }->{log }->{names}->{$log_name }->{$log_sign}++;
$contestants->{$log_sign }->{log }->{mults}->{$log_mult }->{$log_sign}++;
$contestants->{$call_sign}->{call}->{seen }->{$log_sign }++;
$contestants->{$call_sign}->{call}->{names}->{$call_name}->{$log_sign}++;
$contestants->{$call_sign}->{call}->{mults}->{$call_mult}->{$log_sign}++;
}

# print headings.
print $handle_output_contestants 'SIGN', 'NAME', 'MULT', 'STATUS';

for my $sign ( sort keys %$contestants )
{
my $contestant = $contestants->{$sign};

my $details_operator = keys %{$contestant->{log}} ? 'log' : 'call' ;

my $names = _details( $contestant->{$details_operator}->{'names'} );
my $mults = _details( $contestant->{$details_operator}->{'mults'} );

my $status = 'invalid';
if ( ( keys %{$contestant->{log}} and keys %{$contestant->{call}} ) or ( keys %{$contestant->{call}->{seen}} >= $threshold ) )
{
$status = ( @$names > 1 or @$mults > 1 ) ? 'manual' : 'valid' ;
}

# print line.
print $handle_output_contestants $sign, "@$names", "@$mults", $status;
}

return 1;
}

# phase two.
sub _phase2
{

}

# deals with multiple input / output handles in standalone programs.
sub _handle
{
my ( $expression, $mode, $divider ) = @_;

$expression // die 'expression required';
$mode //= '<';

my $handle = undef;
my $handles = [ ];

if ( ref $expression eq 'GLOB' )
{
$handle = $expression;
}
else
{
if ( ref $expression eq ref [ ] )
{
push @$expression, '';
$expression = \$expression->[-1];
}

open $handle, $mode, $expression or die "cannot open '$expression': $!";
}

if ( $mode eq '<' and defined $divider )
{
local $/ = $divider;

while ( my $block = <$handle> )
{
$block =~ s/\Q$divider\E$//;

open my $handle_b, $mode, \$block or die "cannot open '$block': $!";

push @$handles, $handle_b;
}
}
else
{
push @$handles, $handle;
}

return wantarray ? @$handles : $handles->[0] ;
}

# decipher most likely details by weight.
sub _details
{
my ( $details ) = @_;

$details // die 'details required';

my $hash = { };
$hash->{$_} += scalar( keys %{$details->{$_}} ) for ( keys %$details );

my $weight = undef;
my $list = [ before { $weight //= $hash->{$_}; $weight != $hash->{$_} } sort { $hash->{$b} <=> $hash->{$a} } keys %$hash ];

return $list;
}

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tommy OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3542 CW 2015-01-22 0231 W779 Tom OR N6ZF BILL CA
QSO: 3542 CW 2015-01-22 0231 W770 Tom OR N6ZF BIL CA
QSO: 3542 CW 2015-01-22 0231 W771 Tom OR N6ZF BIL CA
QSO: 3542 CW 2015-01-22 0231 W772 Tom OR N6ZF BI CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
#QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE ON
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKEY IF
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
#QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
#QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHN UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHN UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHNNY UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHNNY UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JILL UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N778 PETE UK W6NV OLI CA


Output:

Code
SIGN	NAME	MULT	STATUS 
N2NL DAVE FL invalid
N6ZF BILL|BIL CA manual
N6ZFO BILL CA valid
N777 JILL|JOHN|JOHNNY UK invalid
N778 PETE UK invalid
W6NV OLI CA valid
W770 TOM OR invalid
W771 TOM OR invalid
W772 TOM OR invalid
W779 TOM OR invalid
W7WHY TOMMY|TOM OR manual
W9RE MIKE IN valid
W9RR MIKE|MIKEY IF|ON|IN invalid


Regards,

Chris


(This post was edited by Zhris on Feb 27, 2015, 9:00 PM)


stuckinarut
User

Feb 28, 2015, 12:19 PM


Views: 16461
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris:

Sorry... I went down a 'Black Hole' here briefly with unwanted 'stuff' to deal with like filing an in-person report with the local Sheriff for a harassing phone calls issue ;-(

Will be back working on things later. I also need to finish transcribing my recorded notes.

Thanks!

-Stuckinarut


stuckinarut
User

Mar 1, 2015, 6:29 PM


Views: 16411
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris:

Sorry...delayed further here due to another crisis to deal with. I did extended the original date for publishing this year's event results.

In between the other stuff here, I'm chewing on an idea I think will make all this work pretty turnkey ('Automation').

Hopefully back in a couple of days - thanks for your patience.

-Stuckinarut


stuckinarut
User

Mar 28, 2015, 1:41 AM


Views: 15506
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

(Chris)

FINALLY I came up with the needed solution tweaks.

To not turn this thread into a further Novel from my end, I've posted put the info in a .pdf and uploaded it to:

http://www.xgenesis.com/hashorama/zchris.pdf

I hope you can still help.

Thanks!

-Stuckinarut


Zhris
Enthusiast

Mar 28, 2015, 12:12 PM


Views: 15468
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

I'll hopefully be able to take another look at this tomorrow.

Best regards,

Chris


Zhris
Enthusiast

Mar 30, 2015, 3:47 AM


Views: 15160
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi,

I have read through the PDF in detail and feel I have a good understanding of your vision. At this time I don't really have anything worth adding, between the scripts above, the solution has mostly been covered. When I get time this week, I will consolidate our ideas and code to produce a complete script which can be tweeked as necessary.

I'm still confident in a two phase system, since the complicated aspects are in building a list of valid contestant signs, names and locations, then scoring from then on should be accurate with no need for adjustments, the only adjustments would be to the contestant list. I like your CNQ concept, its similar to mine but is slightly lower level and probably easier to work with. I don't believe you covered it in your PDF, but log"info" (logcall) as oppose to call"info" (callwkd) needs to be weighted differently since it is the contestants own info, this needs a little more thought, I don't think you can rely 100% on the call"info" (callwkd) for building this list of contestants, imagine the scenario where a contestant submitted their log, but no one contacted them, or logged their contacts, or consistently made mistakes.

Regards,

Chris


(This post was edited by Zhris on Mar 30, 2015, 4:00 AM)


stuckinarut
User

Mar 30, 2015, 7:19 AM


Views: 14973
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hello Chris...


Quote
I'm still confident in a two phase system, since the complicated aspects are in building a list of valid contestant signs, names and locations, then scoring from then on should be accurate with no need for adjustments, the only adjustments would be to the contestant list. I like your CNQ concept, its similar to mine but is slightly lower level and probably easier to work with.


Yes, definitely 'lower level', but 'much easier to work with' on my end at this point. Down the line (and if volume of participants/submitted logs increases), then a Phase 2 would have great promise.


Quote
I don't believe you covered it in your PDF, but log"info" (logcall) as oppose to call"info" (callwkd) needs to be weighted differently since it is the contestants own info, this needs a little more thought, I don't think you can rely 100% on the call"info" (callwkd) for building this list of contestants, imagine the scenario where a contestant submitted their log, but no one contacted them, or logged their contacts, or consistently made mistakes.


Yes, I chewed and chewed on this aspect, and considered a 2-pass possibility:

1. Run through all the QSO: lines and grab the actual (LOGCALL-NAME-QTH) Combinations.

2. Then do the (CALLWKD-NAME-QTH) Combos.

3. MERGE both of these to come up with 'The Mother of All CNQ lists'. No need to have a separate weight... just a 'Weight' for each CNQ combination in the MERGED list. BTW, the actual log submitter's (LOGCALL-NAME-QTH) CNQ is pre-programmed into the logger before the event and used for the on-air exchange that is sent (and in the actual log).

FYI, in some Contests, log-checking software will ding (penalize) BOTH parties involved in a QSO (as INVALID) if only one side miscopies or mistypes the data. Of course, this requires that BOTH parties actually submit logs. I don't like that penalty method ;-(

If ALL participants submitted their logs, then the log checking process could be MUCH easier - really a Slam-Dunk and no need for a CNQ or 'Weight' factor or 'WTF'. Each QSO 2-way exchange data would be validated (or invalidated) by the actual log data from BOTH parties. Unfortunately that is not the case, and where the 'CNQ' and your brilliant 'Weight' factor idea comes into play.

The problem is getting everyone to submit their logs. Some folks just show up for part of the event to hand out some QSOs (which is appreciated). In 9 years, the only time a log was submitted with no contacts (actually just some 'partial' Header Info and funny 'Comment'), was this year from a longtime Ham friend just playing around with my new log submit form {SIGH}. Since there are no QSO lines to consolidated into a 'MASTER', he won't show up :^)

I'll advise the troops that I'll be a bit more delayed in getting out this year's Results and sit tight for whenever you can make the adjustments/tweaks. I'm now consolidating all this years log QSO lines, but still ran across some entrants using a different logger module that included QSO serial numbers I must remove from each line ;-(

I really appreciate your help, Chris !!!

- Stuckinarut


Zhris
Enthusiast

Apr 1, 2015, 3:43 PM


Views: 14684
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Just updating you that I have worked on this for a couple of hours this evening but haven't had time to go back through all the notes to ensure everything has been covered then test the code. I will post back at some point tomorrow.

Regards,

Chris


stuckinarut
User

Apr 1, 2015, 4:12 PM


Views: 14676
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Thanks, Chris. Will be standing by.

-Stuckinarut


Zhris
Enthusiast

Apr 3, 2015, 5:21 PM


Views: 14268
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Apologies for the delay. As per our discussion via PM, I realised upon testing I had taken the wrong direction. I haven't had much time to test this most recent code, therefore please test it vigourously and report issues in detail.

Please download the attached compressed file as it contains the script and corresponding test data sets. It has been configured such that you can run main.pl with your 2014 data.

>>>>> How to use:
- All configuration is controlled via the variables and / or the phase configuration hashes near the top.
- Adjust the base and the filepaths accordingly. My advise is to create directories where the script lives that contain all the data files, then set the base to 'directory/'. This will make it easier to manage different sets of data.
- Parts of the configuration can be overrided by supplying arguments to the script. This is particularly useful if you want to quickly test different values without having to modify the script itself.
- To run phase 0:
$perl main.pl --phase_n=0
- To run phase 1:
$perl main.pl --phase_n=1
- The script is interactive and asks you to confirm your intentions throughout. If you want to run the script non interactively i.e. ignore confirmations, then use the --yes argument:
$perl main.pl --phase_n=0 --yes=1
- If you want to adjust the base and / or the case sensitivity and / or the wtf threshold, then use the --base, --case_sensitivity, --wtf_threshold arguments respectively:
$perl main.pl --phase_n=0 --base=path/to/directory/ --case_sensitivity=1 --wtf_threshold=0
- Once you have run phase 0, you should go through the weights log, duplicate entries per sign aren't a problem i.e. in the case of IGOR vs JACK. Delete invalid entries or change their wtf to below the wtf threshold AND ensure valid entries have a wtf above the wtf threshold before running phase 1.

>>>>> Issues and notes:
- Different configurations per phase mainly for filepaths. You may wish to run phase0, then use a different named weights.txt for phase1.
- Phases namespaced to 0 and 1 respectively in order to remain consistent with their index in the phases array.
- Bonus stations inevitably shouldn't log calls to themselves therefore can only receive a maximum of 10000 bonus points.
- An undefined category defaults to '-1', since categories are only available for those who submitted logs. Alternatively, we could consider pushing category to the weights log, therefore giving you the opportunity to adjust after phase 0.
- The no return 'NORET' error is potentially inaccurate, since it is unfairly effected by mistakes and non submitters having no calls. You'll notice most of the errors reflect this. The no return error wasn't part of your recent notes but I have kept it just in case.
- The weights log LOGCALL heading changed to SIGN since the weights log contains a mix of log and call entries.
- Even after our discussion, I decided to weight log cnqs and call cnqs differently >:). The best way to understand how is to read the _input_weights function. There were too many potential issues I invisaged to ignore this, but can easily be changed if need be. Fundamentally every log cnq is given a wtf of 1, while every call cnq is given a wtf of 1 per unique log call ( in case of duplicates ). I believe however, we should also incooporate the category log into this, since this contains a list of submitted logs, therefore these are "guranteed" to be valid. After all though, its up to you to go through the weight log after phase0 and make adjustments before phase1.
- For now, if contestants used multiple names or qths, they will all be listed seperated by a pipeline in the scores log ( wtf dependent ).

>>>>> Todo:
- Full, vigorous testing of every possible scenario.
- Code and namespacing isn't perfect, there is plenty of room for further development.
- Perhaps a new error should be introduced in case anyone logs themselves and cheats the system.
- Debug option, handy output when monitoring script progress, useful during development.
- Optional, configurable headings across all logs.
- Alot of your work appears to be converting each contestants log into a universal format by hand. It would be straight forward to handle this conversion via Perl.


Code
use strict; 
use warnings FATAL => qw/all/;
use Getopt::Long;
use List::Util qw/sum/; # sum0
use Data::Dumper;

#####

local $/ = "\n";
local $" = '|';
local $, = "\t";
local $\ = "\n";

our $yes = 0;

my $phase_n = undef;
my $base = 'live20140116/'; # 'test1/'
my $case_sensitive = 0; # case sensitive should not vary between phases.
my $wtf_threshold = 2; # n of >= 2 is recommended.

GetOptions ( 'yes=i' => \$yes,
'phase_n=i' => \$phase_n,
'base=s' => \$base,
'case_sensitive=i' => \$case_sensitive, # case_sensitive!
'wtf_threshold=i' => \$wtf_threshold, ) or die "cannot get options";

die 'phase_n required or invalid' unless defined $phase_n and $phase_n =~ /^[01]$/;

my $phases =
[
{
handler => \&_phase0,
configuration =>
{
filepath_input_entries => "${base}entries.txt",
filepath_output_weights => "${base}weights.txt",
case_sensitive => $case_sensitive,
},
},
{
handler => \&_phase1,
configuration =>
{
filepath_input_bonuses => "${base}bonuses.txt",
filepath_input_categories => "${base}categories.txt",
filepath_input_weights => "${base}weights.txt",
filepath_input_entries => "${base}entries.txt",
filepath_output_errors => "${base}errors.txt",
filepath_output_scores => "${base}scores.txt",
case_sensitive => $case_sensitive,
bands => { 3 => '80M', 7 => '40M' },
wtf_threshold => $wtf_threshold,
points => 1000,
points_bonus => 5000,
default_wtf => -1, # ensure numeric / below wtf threshold, otherwise expect the unexpected.
default_category => -1, # ensure numeric.
},
},
];

print "begin phase $phase_n";

my $phase = $phases->[$phase_n];

_continue( Dumper( $phase->{configuration} ) . 'does the configuration look ok' );

$phase->{handler}->( $phase->{configuration} );

print "end phase $phase_n";

#####

#
sub _continue
{
my ( $message ) = @_;

return if $yes;

$message .= ', y to continue';

print $message;

chomp( my $response = <stdin> );

exit unless $response eq 'y';

return 1;
}

#
sub _phase0
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

# assign configuration to variables.
my $filepath_input_entries = $configuration->{filepath_input_entries};
my $filepath_output_weights = $configuration->{filepath_output_weights};
my $case_sensitive = $configuration->{case_sensitive};

#
_continue( "'$filepath_output_weights' not empty, do you really want to (re)run phase0" ) if ( stat $filepath_output_weights )[7];

open my $handle_input_entries, '<', $filepath_input_entries or die "cannot open '$filepath_input_entries': $!";
my $weights = _input_weights( $handle_input_entries, $case_sensitive );
close $handle_input_entries;

open my $handle_output_weights, '>', $filepath_output_weights or die "cannot open '$filepath_output_weights': $!";
print $handle_output_weights 'SIGN', 'NAME', 'QTH', 'WEIGHT';
_output_weights( $handle_output_weights, $weights );
close $handle_output_weights;

return 1;
}

#
sub _phase1
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

# assign configuration to variables.
my $filepath_input_bonuses = $configuration->{filepath_input_bonuses};
my $filepath_input_categories = $configuration->{filepath_input_categories};
my $filepath_input_weights = $configuration->{filepath_input_weights};
my $filepath_input_entries = $configuration->{filepath_input_entries};
my $filepath_output_errors = $configuration->{filepath_output_errors};
my $filepath_output_scores = $configuration->{filepath_output_scores};
my $case_sensitive = $configuration->{case_sensitive};
my $bands = $configuration->{bands};
my $wtf_threshold = $configuration->{wtf_threshold};
my $points = $configuration->{points};
my $points_bonus = $configuration->{points_bonus};
my $default_wtf = $configuration->{default_wtf};
my $default_category = $configuration->{default_category};

#
_continue( "'$filepath_input_weights' empty, do you really want to run phase1 now" ) if ! ( stat $filepath_input_weights )[7];
_continue( "'$filepath_output_errors' not empty, do you really want to (re)run phase1" ) if ( stat $filepath_output_errors )[7];
_continue( "'$filepath_output_scores' not empty, do you really want to (re)run phase1" ) if ( stat $filepath_output_scores )[7];

open my $handle_input_bonuses, '<', $filepath_input_bonuses or die "cannot open '$filepath_input_bonuses': $!";
my $bonuses = _input_bonuses( $handle_input_bonuses, $case_sensitive );
close $handle_input_bonuses;

open my $handle_input_categories, '<', $filepath_input_categories or die "cannot open '$filepath_input_categories': $!";
my $categories = _input_categories( $handle_input_categories, $case_sensitive );
close $handle_input_categories;

open my $handle_input_weights, '<', $filepath_input_weights or die "cannot open '$filepath_input_weights': $!";
<$handle_input_weights>; # discard headings.
my $weightsb = _input_weightsb( $handle_input_weights, $case_sensitive );
close $handle_input_weights;

open my $handle_input_entries, '<', $filepath_input_entries or die "cannot open '$filepath_input_entries': $!";
my $entries = _input_entries( $handle_input_entries, $categories, $weightsb, $case_sensitive, $bands, $wtf_threshold, $default_wtf, $default_category );
close $handle_input_entries;

open my $handle_output_errors, '>', $filepath_output_errors or die "cannot open '$filepath_output_errors': $!";
print $handle_output_errors 'LOGCALL', 'CALLWKD', 'BAND', 'TIME', 'NAME', 'QTH', 'ERROR', 'WTF';
_calculate_scores_and_output_errors( $handle_output_errors, $entries, $bonuses, $wtf_threshold, $points, $points_bonus );
close $handle_output_errors;

open my $handle_output_scores, '>', $filepath_output_scores or die "cannot open '$filepath_output_scores': $!";
print $handle_output_scores 'CAT', 'LOGCALL', 'SCORE', 'NAME', 'QTH';
_output_scores( $handle_output_scores, $entries );
close $handle_output_scores;

return 1;
}

#
sub _input_weights
{
my ( $handle_input_entries, $case_sensitive ) = @_;

my $weights = { };
my $weights_ = { };

while ( my $line = <$handle_input_entries> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $log_sign, $log_name, $log_qth, $call_sign, $call_name, $call_qth ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[5..10];

# construct log / call snq.
my $log_snq = join ${,}, $log_sign, $log_name, $log_qth;
my $call_snq = join ${,}, $call_sign, $call_name, $call_qth;

#
$weights->{log }->{$log_snq }->{$log_sign}++;
$weights->{call}->{$call_snq}->{$log_sign}++;
}

for my $log_snq ( keys %{$weights->{log}} )
{
my $log_wtf = sum( values %{$weights->{log}->{$log_snq}} );

$weights_->{$log_snq} = $log_wtf;
}

for my $call_snq ( keys %{$weights->{call}} )
{
my $call_wtf = scalar keys %{$weights->{call}->{$call_snq}};

$weights_->{$call_snq} += $call_wtf;
}

#print Dumper $weights, $weights_;

return $weights_;
}

#
sub _input_bonuses
{
my ( $handle_input_bonuses, $case_sensitive ) = @_;

my $bonuses = { };

while ( my $line = <$handle_input_bonuses> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

#
my ( $sign ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

warn 'duplicate' if defined $bonuses->{$sign};

#
$bonuses->{$sign} = 1;
}

#print Dumper $bonuses;

return $bonuses;
}

#
sub _input_categories
{
my ( $handle_input_categories, $case_sensitive ) = @_;

my $categories = { };

while ( my $line = <$handle_input_categories> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

#
my ( $sign, $category ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

warn 'duplicate' if defined $categories->{$sign};

#
$categories->{$sign} = $category;
}

#print Dumper $categories;

return $categories;
}

#
sub _input_weightsb
{
my ( $handle_input_weights, $case_sensitive ) = @_;

my $weightsb = { };

while ( my $line = <$handle_input_weights> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $sign, $name, $qth, $wtf ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

#
my $snq = join ${,}, $sign, $name, $qth;

#
$weightsb->{$snq} =
{
sign => $sign,
name => $name,
qth => $qth,
wtf => $wtf,
};
}

#print Dumper $weightsb;

return $weightsb;
}

#
sub _input_entries
{
my ( $handle_input_entries, $categories, $weightsb, $case_sensitive, $bands, $wtf_threshold, $default_wtf, $default_category ) = @_;

my $entries = { };

# process.
while ( my $line = <$handle_input_entries> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $frequency, $call_time, $log_sign, $log_name, $log_qth, $call_sign, $call_name, $call_qth ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

my $log_snq = join ${,}, $log_sign, $log_name, $log_qth;
my $log_wtf = $weightsb->{$log_snq}->{wtf} // $default_wtf;

next if $log_wtf < $wtf_threshold;

# lookup band via frequency.
my $band = $bands->{( $frequency =~ /([1-9])/ )[0]}; # todo: // 'other' / error.

my $log_category = $categories->{$log_sign} // $default_category;
my $log_calls = $entries->{$log_sign}->{bands}->{$band} //= [ ]; # use //= to allow autovivification / assign default value.

my $call_snq = join ${,}, $call_sign, $call_name, $call_qth;
my $call_wtf = $weightsb->{$call_snq}->{wtf} // $default_wtf;
my $call_duplicate = ( grep { $_->{sign} eq $call_sign } @$log_calls ) ? 1 : 0 ;
#my $call_return = undef; # cannot do yet, not until every call call has been pushed.

#
_construct_log_entry( $entries, $log_category, $log_sign, $log_name, $log_qth );

#
_construct_call_entry( $log_calls, $call_time, $call_sign, $call_name, $call_qth, $call_wtf, $call_duplicate );
}

# process remainder that have ok wtf. Technically namespace not log or call specific, but constructs log entry.
for my $log_snq ( keys %$weightsb )
{
my $log_wtf = $weightsb->{$log_snq}->{wtf} // $default_wtf;

next if $log_wtf < $wtf_threshold;

my $log_sign = $weightsb->{$log_snq}->{sign};
my $log_name = $weightsb->{$log_snq}->{name};
my $log_qth = $weightsb->{$log_snq}->{qth};
my $log_category = $categories->{$log_sign} // $default_category;

#
_construct_log_entry( $entries, $log_category, $log_sign, $log_name, $log_qth );
}

#print Dumper $entries;

return $entries;
}

#
sub _construct_log_entry
{
my ( $ref, $log_category, $log_sign, $log_name, $log_qth ) = @_;

#
$ref->{$log_sign}->{category} //= $log_category;
$ref->{$log_sign}->{names}->{$log_name} = 1;
$ref->{$log_sign}->{qths }->{$log_qth } = 1;
$ref->{$log_sign}->{bands} //= { };
$ref->{$log_sign}->{bonuses} //= { };
$ref->{$log_sign}->{score} //= 0;

return 1;
}

#
sub _construct_call_entry
{
my ( $ref, $call_time, $call_sign, $call_name, $call_qth, $call_wtf, $call_duplicate ) = @_;

#
push @$ref,
{
time => $call_time,
sign => $call_sign,
name => $call_name,
qth => $call_qth,
wtf => $call_wtf,
duplicate => $call_duplicate,
};

return 1;
}

#
sub _output_weights
{
my ( $handle_output_weights, $weights ) = @_;

for my $snq ( sort keys %$weights )
{
my $wtf = $weights->{$snq};

# print.
print $handle_output_weights $snq, $wtf; # important that snq is $, divided.
}

return 1;
}

#
sub _calculate_scores_and_output_errors
{
my ( $handle_output_errors, $entries, $bonuses, $wtf_threshold, $points, $points_bonus ) = @_;

for my $log_sign ( sort keys %$entries )
{
my $log = $entries->{$log_sign};

my $log_bands = $log->{bands};
my $log_bonuses = $log->{bonuses};

for my $band ( sort keys %$log_bands )
{
my $log_calls = $log_bands->{$band}; # // [ ];

for my $call ( sort { $a->{sign} cmp $b->{sign} || $a->{time} <=> $b->{time} } @$log_calls )
{
my $call_time = $call->{time};
my $call_sign = $call->{sign};
my $call_name = $call->{name};
my $call_qth = $call->{qth};
my $call_wtf = $call->{wtf};
my $call_duplicate = $call->{duplicate};
my $call_calls = ( exists $entries->{$call_sign} ) ? $entries->{$call_sign}->{bands}->{$band} : [ ]; # use condition to prevent autovivification.
my $call_return = ( grep { $_->{sign} eq $log_sign } @$call_calls ) ? 1 : 0 ;

# validate call.
my ( $call_error, $call_wtf_string ) = ( $call_duplicate ) ? ( 'DUPE' , $call_wtf ) :
( $call_wtf < $wtf_threshold ) ? ( 'CNQ' , "$call_wtf<$wtf_threshold" ) :
( not $call_return ) ? ( 'NORET', $call_wtf ) :
( undef , undef ) ;

# log errors or update score.
if ( defined $call_error )
{
# print.
print $handle_output_errors $log_sign, $call_sign, $band, $call_time, $call_name, $call_qth, $call_error, $call_wtf_string;
}
# todo: better if scoring handled in own function or by _output_scores.
elsif ( exists $bonuses->{$call_sign} and not exists $log_bonuses->{$call_sign} )
{
$log->{score} += $points + $points_bonus;

$log_bonuses->{$call_sign} = 1;
}
else
{
$log->{score} += $points;
}
}
}
}

return 1;
}

#
sub _output_scores
{
my ( $handle_output_scores, $entries ) = @_;

for my $log_sign ( sort { $entries->{$a}->{category} <=> $entries->{$b}->{category} || $entries->{$b}->{score} <=> $entries->{$a}->{score} } keys %$entries )
{
my $log = $entries->{$log_sign};

my $log_category = $log->{category};
my $log_names = [ keys %{$log->{names}} ];
my $log_qths = [ keys %{$log->{qths }} ];
my $log_score = $log->{score};

# print.
print $handle_output_scores $log_category, $log_sign, $log_score, "@$log_names", "@$log_qths";
}

return 1;
}


Regards,

Chris


(This post was edited by Zhris on Apr 3, 2015, 5:42 PM)
Attachments: contestcrosschecker.zip (18.9 KB)


Zhris
Enthusiast

Apr 3, 2015, 7:05 PM


Views: 14250
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Just been reading back through the code, there are a couple of issues, but I'm not going to worry about those right now, I don't think your 2014 test data encounters them. I'm thinking I would also like to simplify the code across _input_weightsb, _input_entries and _calculate_scores_and_output_errors, I should have built an intermediate contestants structure to work off of more easily.

Chris


stuckinarut
User

Apr 3, 2015, 7:32 PM


Views: 14245
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi, Chris:

Thank you so much for your continued efforts and assistance. I will do some rigorous testing over the weekend & report back.

If any questions during the process, I'll post them here.

- Stuckinarut


stuckinarut
User

Apr 4, 2015, 7:05 AM


Views: 14121
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi, Chris:

Up briefly on little sleep but wanted to make a first go of things :^)

In my groggy state I was a bit confuzzzzed until I re-read the --phase_n= info again. Especially, since I have not had any past experience with an 'interactive' Perl script. Pretty cool indeed when I was finally able to take the first 'Test Drive' !!!

1. When I looked at the Error log, it was filled with 'NORET' entries which made my eyeballs roll. I am only interested in CNQ and DUPE errors. It would be very helpful if you could please add an entry in the 'config' area something like:

noret = {on/off] or [yes/no]

2. What did immediately pop into mind was how cool and efficient it would be to have a final column in the Error Report of 'ADJ' (for Adjustment) which would have a Unique Number for each particular Error log entry. Then, to re-run the script with a different 'phase' which would give the option to plug-in any 'ADJ' entries that upon Manual inspection/cross-checking the logs, that *should* be considered valid and included in a final update to the Scores list. Something like: Enter ADJ Number: and then the option to enter more until finished instead of having to re-run the script for individual adjustments.

3. In the Scores list, I am confuzzzzzed about the (negative -) numbers for some of the Callsigns. Especially, this one for W7WHY who was a log submitter:

-1 W7WHY 0 TOM OR

Yes, I'm confuzzzzzzed here (probably because I need more sleep!!!)

4. For the Weight listing, can you please explain how I can change the Weight for each entry to be equal for further comparison and examination? I think this is going to be important for purposes of 'Education and Illustration' to the log submitters in terms of an actual percentage of the problems with miscopied and/or mistyped data based upon a standardized weight factor. Yes, that would be very helpful.

5. Regading the piped display of (one example) VE4 | MB for two of the Canadian entrants, this highlights an issue I must clearly communicate to folks in the next year's 'Rules' ... because technically both VE4 & MB are correct (VE4 is the call 'prefix' for the 'mult' (QTH) of MB - Manitoba). However, if in fact VE4 was sent but MB entered into in the recipient log, that *should* be an error. But some logging software *may* auto-convert a VE4 entry to the normally recognized MB used in scoring. Not sure if I'm conveying this properly.

Eyelids are drooping... need a few more hours of sleep before diving back in.

Thanks!

-Stuckinarut


stuckinarut
User

Apr 4, 2015, 7:14 AM


Views: 14115
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Ohhh... two more things before I head off for ZZZZZZZZZ'sVille.

I can do this manually in Excel, but if not a lot of extra work, once again for 'Educational' type purpose to the troops, to add 1more output .txt list file and a minor mod to the weights.txt file:

1. nologs.txt ... this would be a NET-NET list of 'Unique' callsigns from ONLY the (CALLSWKD) column that did NOT actually submit a log. Having a final "Total QTY" line at the bottom of the list would eliminate importing, tallying & exporting with Excel.

2. weights.txt ... just to tally the total # of CNQ's involved, which would again same manual work involving Excel.

Thanks!

-Stuckinarut


Zhris
Enthusiast

Apr 4, 2015, 7:33 AM


Views: 14110
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

I will be re working the script later and will take into account the issues you raise here. I just wanted to respond to each of your points.

1) The noret isn't very accurate at this time as explained above. I will be improving this accuracy by ensuring it does not error on non submitters, although this in turn has disadvantages, once I have figured it out I will provide the details.

2) This could be a good idea.

3) A negative 1 indicates there was no category as explained above. I chose to use a number for ease later when outputting the scores and doing a numerical sort on categories. In the case of W7WHY, it isn't in the categories log.

4) Not sure I fully understand, but I will think about it and get back to you.

5) I'll think about how this could be accounted for. Its another complexity that will take some thought to implement ;-). But how do we know ve4 was sent and not mb.

Regarding your second response, both 1) and 2) can be incorporated.

Sleep tight. Regards,

Chris


(This post was edited by Zhris on Apr 4, 2015, 7:39 AM)


stuckinarut
User

Apr 4, 2015, 10:48 AM


Views: 14024
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris -

Back up but still running a sleep deficit here ;-(

Hmmm... I checked this original 'Category' list in the .zip upload and W7WHY was included:

LQP58_CALLSIGN-CATEGORY-LISTSORTED.txt

It appears a Gremlin is lurking about in our midst trying to cause problems ???

As I was waking up, I realized how brilliant your structuring to the sub-directory system was and to use 'understandable' .txt file names vs. listQ.txt etc. This also eliminates having to keep typing the multiple list_.txt names each time I re-run the script. For future years, all I need to do is create a different sub-directory. BRILLIANT 'Forward Thinking', Chris !!!

Will do more 'Test Drives' in-between working on taxes ;-(

- Stuckinarut


Zhris
Enthusiast

Apr 4, 2015, 11:35 AM


Views: 14010
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

I might be going crazy here, but I can't find w7why in live20140116/categories.txt. Am I using outdated data, where did LQP58_CALLSIGN-CATEGORY-LISTSORTED.txt come from?

Chris


stuckinarut
User

Apr 4, 2015, 11:41 AM


Views: 14009
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Chris:

I took another Quickie TD look at the Error log by importing it into Excel, doing a sort on the Error column, nuking the NORET entries, and then re-sorting the data based on the desired outcome in my original .pdf file.

As I took a look-see, I was just about the also nuke the CNQ entries that were NOT *below* the WTF, when I saw this one with a high CNQ value:

K6NV VE3KI 80M 0242 RICH ON CNQ 86

So I went to the main Entries list to do a find to see how many QSOs were reported with VE3KI by all log submitters, but only found this single one:

QSO: 3543 CW 2014-01-16 0242 K6NV BOB CA VE3KI RICH ON

In the Weights list, VE3KI only shows with a value of 1. I'm very curious how a CNQ of 86 was assigned to this 'transaction' (QSO) in the Error log?

As I mentioned before, I think just using (or being able to specify) a single Weight for all QSO lines ('transactions') is easiest. For the Error log Manual analysis & decision making to be done, keeping it to just CNQ below<WTF and DUPE errors will greatly simplify testing. I can use the separate Weight list side-by-side during for the Manual analysis.

Unless I messed up, here is the 'prune' Error list down to only CNQ's with <2 (WTF) and DUPE entries:


Code
LOGCALL	CALLWKD	BAND	TIME	NAME	QTH	ERROR	WTF 
K6DGW W9RE 40M 0226 JOHN IN CNQ 1<2
K6NV W7OM 40M 0218 ROD WA DUPE 35
K6NV K6VVA 80M 0257 RICK CA CNQ 1<2
K9YC VE3DZ 40M 0225 YURI ON DUPE 32
K9YC N3QE 80M 0255 TIM MD DUPE 39
N0AC N4JRG 80M 0253 MIKE KY DUPE 87
N0TA N5DO 80M 0256 DAVE TX DUPE 32
N3QE K9YC 80M 0255 JACK CA DUPE 15
N4AF VE3KQN 80M 0248 JIM VE3 CNQ 1<2
N4JRG N4AFY 80M 0243 JACK NC CNQ 1<2
VE4YU K0AD 80M 0231 LOCUST MD CNQ 1<2
W4AU VE3BZ 80M 0231 YURI ON CNQ 1<2
W4OC W6NV 40M 0216 JACK CA DUPE 37


My suspicions are there may be more actual Errors based on the WTF level, but will have to investigate a bit :^)

Ohhh, regarding my previous idea about an 'ADJ' column, when the QSO lines are initially imported, they *could* be auto-assigned something like a QID (QSO ID Number) for later use in making 'interactive' adjustments. Maybe that would need to be a separate script?

Just some more feedback before finally attacking the taxes nightmare.

- Stuckinarut


stuckinarut
User

Apr 4, 2015, 11:45 AM


Views: 14006
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
I might be going crazy here, but I can't find w7why in live20140116/categories.txt. Am I using outdated data, where did LQP58_CALLSIGN-CATEGORY-LISTSORTED.txt come from?

Chris


OHHHH... sorry, my bad (I'll chalk it up to the sleep deficit). I was apparently mixing apples & oranges (2014 & 2015 data). Sincere apologies.

- Stuckinarut


stuckinarut
User

Apr 4, 2015, 12:22 PM


Views: 13996
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris -

I decided to quickly do another run using wtf = 3 and the Error log entries increased dramatically (of course :^)

As I quickly looked at one entry (the W7OM QSO with WQ5L) that showed a CNQ Error WTF of 1<3 I thought a bit strange. Checking the Weights list showed WQ5L = 97. Hmmm. Even if using a single Weight factor throughout which would have likely been in the 50 something range, not sure why this showed as an error.

NOTE: There were actually 2 QSOs by W7OM with WQ5L but on different bands - same CNQ 1<3 Error.

Here's the WTF <3 run Error log for CNQ & DUPE entries:


Code
LOGCALL	CALLWKD	BAND	TIME	NAME	QTH	ERROR	WTF 
K0AD W6NV 80M 0242 ??? CA CNQ 1<3
K0EU N5ZO 40M 0220 MARCO CA CNQ 1<3
K0EU K6WG 40M 0221 STAN CA CNQ 2<3
K0TG N5ZO 40M 0205 MARK CA CNQ 1<3
K0TG K6BGW 40M 0217 SKIP CA CNQ 1<3
K1GU K6VVA 80M 0236 HANK CA CNQ 1<3
K1GU N8XX 80M 0238 HANK MI CNQ 1<3
K6DGW W1NN 40M 0213 HAL SC CNQ 1<3
K6DGW W9RE 40M 0226 JOHN IN CNQ 1<3
K6NV K4BAI 40M 0204 JACK GA CNQ 1<3
K6NV K5OT 40M 0215 LARRY TX CNQ 1<3
K6NV W7OM 40M 0218 ROD WA DUPE 81
K6NV W9RE 40M 0223 JACK IN CNQ 1<3
K6NV K0AC 40M 0224 BILL IA CNQ 1<3
K6NV VE3KI 80M 0242 RICH ON CNQ 1<3
K6NV K6VVA 80M 0257 RICK CA CNQ 1<3
K6SRZ N3SD 80M 0241 JOE PA CNQ 1<3
K7SS NK9G 40M 1747 RICK WI CNQ 2<3
K7SS K6WG 40M 1748 STAN CA CNQ 2<3
K9YC KM7Q 40M 0212 BOB OR CNQ 1<3
K9YC VE3DZ 40M 0225 YURI ON DUPE 83
K9YC N3QE 80M 0255 TIM MD DUPE 102
N0AC K2QBN 40M 0212 VAN FL CNQ 1<3
N0AC N4LOV 40M 0225 AL AL CNQ 1<3
N0AC N3ID 80M 0246 GREG PA CNQ 1<3
N0AC N4JRG 80M 0253 MIKE KY DUPE 52
N0TA N6DA 40M 0221 JIM CA CNQ 1<3
N0TA XE3S 40M 0227 MARKO XE CNQ 1<3
N0TA N5DO 80M 0256 DAVE TX DUPE 93
N3BB N8XX 40M 0201 IGOR MI CNQ 2<3
N3BB W4UX 40M 0202 JIM NC CNQ 1<3
N3BB N0AT 80M 0246 JOHN CA CNQ 1<3
N3QE K6DGW 40M 0229 SCIP CA CNQ 1<3
N3QE K9YC 80M 0255 JACK CA DUPE 102
N3SD N5ZO 40M 0216 KA CA CNQ 1<3
N3SD N3QE 80M 0258 TIM MN CNQ 1<3
N4AF VE3KQN 80M 0248 JIM VE3 CNQ 1<3
N4AF KG4USN 80M 0250 KEN GA CNQ 1<3
N4JRG XE2S 40M 0228 MARYO DX CNQ 2<3
N4JRG N4AFY 80M 0243 JACK NC CNQ 1<3
N4JRG N9AC 80M 0253 BILL IA CNQ 1<3
N4JRG N3SB 80M 0255 GILL PA CNQ 1<3
N5DO W1EBI 80M 0234 GEO MA CNQ 1<3
N5DO K0TA 80M 0248 JOHN CO CNQ 1<3
N5DO K0EU 80M 0259 JOHN CO CNQ 1<3
N6DA VE4EA 40M 0212 GARY MB CNQ 1<3
N6DA W4NG 40M 0218 TED TN CNQ 1<3
N6DA W9RE 40M 0226 MIKE IL CNQ 1<3
N6IP N4LOV 40M 0227 CARL AL CNQ 1<3
N6RO WH6LE 40M 0205 PETE HI CNQ 1<3
N6ZFO N5RO 40M 0216 JACK CA CNQ 1<3
N6ZFO N6DA 80M 0243 27 DON CNQ 1<3
N8XX K1GU 80M 0237 NEB TN CNQ 1<3
VE4EA N5IP 80M 0242 JACK CA CNQ 1<3
VE4EA W4VA 80M 0258 JOHN VA CNQ 1<3
VE4YU K0AD 80M 0231 LOCUST MD CNQ 1<3
W0BH NK9G 40M 0217 RICK WI CNQ 2<3
W1EBI W1NN 80M 0235 STAN OH CNQ 1<3
W4AU VE3BZ 80M 0231 YURI ON CNQ 1<3
W4AU W1NN 80M 0235 HAL MA CNQ 1<3
W4NJK W6NV 80M 0238 OLIVER CA CNQ 1<3
W4NJK W7WHY 80M 0253 JIM WA CNQ 1<3
W4OC W6NV 40M 0216 JACK CA DUPE 26
W7OM VE4EA 40M 0206 ED MB CNQ 1<3
W7OM K5OT 40M 0211 JIM TX CNQ 1<3
W7OM K6TV 40M 0213 BOB CA CNQ 1<3
W7OM W5QL 40M 0215 RAY MS CNQ 1<3
W7OM XE2S 80M 0241 MARYO DX CNQ 2<3
W7OM W5QL 80M 0253 RAY MS CNQ 1<3
W7OM VE4EA 80M 0257 ED MB CNQ 1<3
WA6URY N0AC 40M 0205 BILL CA CNQ 1<3
WA6URY N4AF 40M 0223 JACK TN CNQ 1<3
WA6URY XE2S 80M 0247 MARC DX CNQ 1<3
WQ5L N5AW 80M 0247 MARV CO CNQ 1<3
XE2S W7OM 80M 0241 RON WA CNQ 1<3


Hope this feedback helps.

- Stuckinarut


(This post was edited by stuckinarut on Apr 4, 2015, 12:24 PM)


stuckinarut
User

Apr 4, 2015, 12:47 PM


Views: 13988
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

I gotta get to the taxes, but can't put this thing down :^)

FYI, I checked (PUN!) the logging software I use, and son-of-a-gun ... entries for the Canadian 'Manitoba' QTH/Mult of either MB or VE4 both get accepted, and the auto-calculated score update that displays after each QSO accurately reflects either entry as valid.

Hmmm.

I thought this was pretty cool what you did in the Scores output:


Code
5	VE3DZ	49000	YURI	VE3|ON 
7 XE2S 23000 MARCO DX|XE
8 VE4EA 46000 CARY VE4|MB
8 VE4YU 22000 ED VE4|MB


These variances will only apply to NON-USA entries. However, as I mentioned previously, IF 'VE4' was sent but 'MB' entered into the log, well, hmmm... perhaps that could count as a 'Mulligan' - haven't decided yet.

In the logging software, there is a .txt type file for all Mults/QTH listings with any alternates. I'm wondering if something similar should be used to insure validation? In the case of *some* 'DX' entries that also have another Mult/QTH designator, I might have to manually add those to a list depending on what ends up showing in the Master CNQ 'Weights' list.

Just a thought. I'll be offline now for some hours.

- Stuckinarut


Zhris
Enthusiast

Apr 4, 2015, 2:13 PM


Views: 13967
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,


Quote
As I quickly looked at one entry (the W7OM QSO with WQ5L) that showed a CNQ Error WTF of 1<3 I thought a bit strange


Looking at the errors, W7OM actually called W5QL ( wtf = 1 ) not WQ5L ( wtf = 97 ), looks like a genuine CNQ error to me.


Quote
NOTE: There were actually 2 QSOs by W7OM with WQ5L but on different bands - same CNQ 1<3 Error.


With the current weighting system, it will only log a wtf of 1 per unique log sign to ensure duplicates don't skew the wtf, and doesn't account for band as that may seem over the top ( CNQB ;- ) ). W7OM made this same mistake twice, but only weighted it 1. The relevant code snippets:


Code
$weights->{call}->{$call_snq}->{$log_sign}++; # $weights->{call}->{'W5QL RAY MS'}->{'W7OM'}++ 

...

for my $call_snq ( keys %{$weights->{call}} )
{
my $call_wtf = scalar keys %{$weights->{call}->{$call_snq}}; # my $call_wtf = 1

$weights_->{$call_snq} += $call_wtf; # $weights_->{'W5QL RAY MS'} += 1 ( += is misleading, just = is fine )
}


I've started to make a few adjustments, keep reporting potential issues as and when you have time.

Regards,

Chris


(This post was edited by Zhris on Apr 4, 2015, 2:22 PM)


stuckinarut
User

Apr 4, 2015, 4:56 PM


Views: 13928
Re: [Zhris] HASH-O-RAMA Data Processing Problem


Quote
Looking at the errors, W7OM actually called W5QL ( wtf = 1 ) not WQ5L ( wtf = 97 ), looks like a genuine CNQ error to me.


OUCH-OUCH-OUCH... my bad again, sorry... must have been temporary Dyslexia here caused by rushing too fast ;-(

I may be a bit more scare for the next few days with the tax stuff {SIGH}, but will find a way to play some Hookey to test.

Thanks, Chris.

- Stuckinarut


stuckinarut
User

Apr 5, 2015, 10:45 AM


Views: 13703
Re: [Zhris] HASH-O-RAMA Data Processing Problem

For thread readers wondering if anything is happening, there is via a number of Personal Messages (to keep the thread length down). I decided to go ahead and post this new suggestion for Chris to the Forum to keep things a bit alive here.
=======

Chris:

Just as my head hit the pillow earlier, a "Flash-of-Inspiration" struck :^)

Regarding my previous suggestion about adding an 'ADJ' column to the Error log that would contain the 'QID' (QSO ID Number) for each Error log QSO entry (a/k/a 'Transaction').

Adding one more .txt file to the mix called 'adjusts.txt' would be to simply copy & paste (or type) the QID for whatever QSOs from the Error log are to be adjusted/ validated/given credit after scrutiny (if any). This would be similar to the 'bonuses.txt' file list.

THEN, when re-running the main script, as each QSO/Transaction is checked, *BEFORE* what would normally dump a QSO to the Error log, a piece of code would check the (new) 'adjusts.txt' file before proceeding. IF there is a match of the QID involved in the (new) 'adjusts.txt' file ... 'BINGO` ... the QSO credit is given (and if a 'Bonus Points' station that credit as well).

Example (adjusts.txt file):


Code
387 
14
1599
260
3


The new 'scores.txt' file/report would then be the FINAL (Adjusted) SCORES for integrating into the event RESULTS as desired.

Yeah, this would be the proverbial 'Cat's Meow' :^)

WOW... as I was just finishing typing above, another 'FLASH'... of how everything could be done 'Interactively' *during* the running of a single script, but let's keep things in 'K.I.S.S.' mode for now.

Thanks!

Eric


Zhris
Enthusiast

Apr 9, 2015, 8:38 PM


Views: 12766
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Haven't spoken to you for a couple of days, just letting you know that I am about to test the latest version and will get back to you tomorrow.

Regards,

Chris


stuckinarut
User

Apr 9, 2015, 8:45 PM


Views: 12763
Re: [Zhris] HASH-O-RAMA Data Processing Problem

No problem, Chris...still working on taxes ;-(

Looking forward to testing the new version !!!

Thanks very much,

-Stuckinarut


Zhris
Enthusiast

Apr 10, 2015, 3:58 AM


Views: 12732
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Check your PMs when you have time, I am very nearly ready but have an issue regarding the adjustments log. I'm posting this here in case you have "Send private message notification via e-mail" off.

Regards,

Chris


stuckinarut
User

Apr 10, 2015, 4:15 AM


Views: 12729
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Christ...

Received your PM OK and just replied with a PM :^)

Thanks!

- Stuckinarut


Zhris
Enthusiast

Apr 10, 2015, 6:08 AM


Views: 12723
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Thank you.

I'm still not 100% but have enough to go on, I will finish implementing adjustments later, everything else is ready including my notes to you.

Chris


stuckinarut
User

Apr 10, 2015, 9:42 AM


Views: 12659
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hey, Chris...

Got a couple more hours sleep. Just as my head hit the pillow, this CNQ ANALYSIS came to mind.

It will help you understand better how I am going to use it in conjunction with the ERROR LOG.

www.xgenesis.com/hashorama/2014_LQP_CNQ_ANALYSIS.pdf

So out of 115 CNQ Combinations, the GOOD and BAD are almost evenly split. Since I am familiar with most of the actual GOOD vs. BAD ones by memory, any of the MAYBE or ??? entries that show up I can fire off an email to those guys to verify EXACTLY what NAME & QTH they use (or to determine if these might have been "One-Off" log paddings {GRIN}.

For the ??? entries the same thing, but you will note that these had DOUBLE bad combinations for the identical Callsign and only 1 single entry each.

In the event any BAND 'DUPE' might also be a CNQ (or Vice-Versa), either case will result in an ERROR that will NOT be validated.

So I have a 'system' formulated here :^)

DISCLAIMER: I whipped this analysis together VERY rapidly, so there could be one or 2 "ERRORS" {SIGH}, but 'Close enough for Government work' in terms of an illustration.

Hope this helps!!!

- Stuckinarut

P.S. Once again, many of the "BAD" problems are the result of guys using "PRE-FILLS" in the logging software as I previously explained I think in a PM.


(This post was edited by stuckinarut on Apr 10, 2015, 9:47 AM)


stuckinarut
User

Apr 10, 2015, 9:54 AM


Views: 12657
Re: [Zhris] HASH-O-RAMA Data Processing Problem

(MORE)...

Regarding all those split VE3|ON, VE4|MB and XE|DX problems, I can eliminate most of those in future years by CLARIFYING BY "EXAMPLES" WITHIN THE RULES of what these guys must do in their logging software in order to *NOT* end of with DQ'd ("Disqualified") QSOs !!!

The same for whenever a DIFFERENT NAME is used than normal (like another "Honor/Tribute" situation), etc.

This has already been EXTEREMLY VALUABLE in seeing the "BIG PICTURE" of some needed actions to be taken !!!

Thanks again for helping bring these things to light in an 'automated' way !!!

- Stuckinarut


(This post was edited by stuckinarut on Apr 10, 2015, 9:55 AM)


stuckinarut
User

Apr 10, 2015, 10:03 AM


Views: 9739
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Ohhhhh... one more thing which may help.

After making the first pass that produces the ERROR LOG, in addition to printing out the CNQ list, I will print out the short SCORES list, which has the actual VALID CNQ Combinations for each of the submitted logs.

It will be a visual "piece-of-cake" to make most of the decisions and any "QID" adjustments as I go down through each entry in the ERROR LOG.

Hopefully, this will all make sense now?

BUT WAIT...THERE'S MORE...

Since what you are generously doing here will substantially reduce the annual "Nightmare" of log processing, I'll go on a campaign to try and at least double or maybe even triple activity in what will be the 10th Anniversary event next year. Maybe even increase the QSO points to 10,000 each, and the Bonus Points to 50,000 so everyone can end up with much BIGGER scores :^)

- Stuckinarut


(This post was edited by stuckinarut on Apr 10, 2015, 10:09 AM)


stuckinarut
User

Apr 10, 2015, 11:42 AM


Views: 9728
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

CORRRRRRRRECTIONS...

Sorry, rushed tooooo fasssst on insufficient sleep and missed flagging a couple in this list:

http://www.xgenesis.com/hashorama/2014_LQP_CNQ_ANALYSIS.pdf

I need an 'ERROR LOG' for myself ;-(

Uploaded a 2nd corrected version (found one more goof). Now it's even between GOOD and BAD at 50% each !!!

FYI,

- Stuckinarut


(This post was edited by stuckinarut on Apr 10, 2015, 11:55 AM)


stuckinarut
User

Apr 10, 2015, 11:49 AM


Views: 9727
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Yet another thought...

Some NON-Log Submitter folks just show up for sometimes the last few minutes of an on-air event like this. For some time I've been chewing on *maybe* establishing an 'MQT' (Minimum QSO Threshold) for which these CNQ Combinations *MUST* show up in the Log Submitter Logs in order to be valid. Maybe 5 which I think is reasonable, or 3 if I want to be more "Mulligan-Ish" :^) That would still save me time in sending out individual emails to ascertain IF some of these potential "One-Offs" actually participated.

FYI (again),

- Stuckinarut


Zhris
Enthusiast

Apr 10, 2015, 12:07 PM


Views: 9720
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Finally have a new version, please see attached. I have read back all your messages over the last week plus in order to implement whats remaining. Forgive me if I have missed anything. It might be worth you going back through all of your messages and retesting / report anything not to your satisfaction. Please read all of the notes below in detail as they may offer explnations to some of your queries.

Recommended usage:

$perl main.pl --interactive --phase_n=0
$perl main.pl --interactive --phase_n=1
$perl main.pl --interactive --phase_n=2

add --base=path/to/dir to temporarily run on a different set of data.
add --case_sensitive to temporarily turn on case sensitivity.
add --wtf_threshold=7 to temporarily adjust the wtf threshold.

Notes:

- code better organized. The core algorithm is in one place now, the _input_contestants function. This makes the code more manageable, although still not perfect. I have used a couple of bad practices purely for simplicity. Interactive / configuration interface not that great. Etc etc etc. When the code finally works as desired, we can consider refactoring and improving, but probably isn't necessary.

- if you have problems installing List::MoreUtils, remove the use statement at the top and add 'any' to the List::Util import list. I don't want to upgrade List::Util on my test machine at the moment to support the any function.

- boolean command line arguments now controlled via on = --arg and off = --noarg or don't supply i.e. --case_sensitive, --nocase_sensitive.

- phase2 added since the introduction of adjustments. Once you have run phase1 you can make adjustments then run phase2 to rescore, repeat if necessary. phase2 does not modify the errors log.

- revamped prompt system. For non input prompts, you just type c to continue or e to exit the script. For input files that don't exist or are empty you can input lines if you wish, just type each line / return, then type c when you have finished, especially useful for testing different adjustments.

- no return accuracy has been massively improved, but still not perfect. I have left it in for now, you'll notice the error log is no longer flooded with them. If desired i'll make all errors configurable on / off.

- selfie error added. I hadn't noticed it by eye, but I noticed in the errors log there was one selfie error in your 2014 data!

- outputted total rows at end of each output file in the format "# total = n". I would prefer to keep the # at the beginning as this indicates a comment line and is ignored when read, especially important when reading back in the weights and errors logs.

- I haven't adjusted the weighting algorithm, I will if you are still not satisfied, it is easy to adjust. Remember, a single wtf shouldn't have much meaning alone, it only has meaning as a comparison against other wtfs. The current weighting algorithm is prepared for potential issues that we have not yet seen with our test data. But I do agree with you, that perhaps we should only weight call cnqs, not log cnqs.

- unsubmitted log created during phase1, it contains log signs that were deciphered to come from unsubmitted logs.

- every error now listed under errors column in errors log. As a result, for now I have removed the wtf<wtf_threshold under the wtf column when cnq error.

- looking over your comparisons pdf, there are indeed huge differences between manual and automatic. I believe improvements to this version will reduce this gap, especially the no return. It would be good if you could regenerate this comparison. We should then consider selecting sample records and investigate the differences.

- you described issues with log sign K6NV against call sign VE3KI, but I failed to confirm this during my own investigation upon testing this version. Also you raised this issue when you had muddled up old and new data. Please revist this.

- I didn't go over your case studies pdf in detail, these should be revisited.

- you discussed weighting percentages. I didn't quite understand what you mean. Did you mean that you would like to group wtfs by their sign in the weights log, and calculate their individual percentages.

- I need to re read your notes regarding piped locations in the scores log. Although as far as I can tell, it wouldn't really be possible to figure out that when location x is supplied, they actually meant location y, and you will probably have to control this via the adjustments log instead. Let me know if I didn't understand correctly.

- Always keep in mind that you have the power to accurately control the outcome by adjusting the weights and adjustments logs. Adjusting a weight controls a range of outcomes, while inserting an adjustment controls a specific outcome.

Regards,

Chris
Attachments: contestcrosschecker.zip (21.0 KB)


stuckinarut
User

Apr 10, 2015, 12:09 PM


Views: 9719
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

DARN... that's the problem with RUSHING too fast. Just caught one more Error in my hasty CNQ ANALYSIS, and uploaded (another) corrected version at 1908 UTC (10 April):

http://www.xgenesis.com/hashorama/2014_LQP_CNQ_ANALYSIS.pdf

Apologies for the screw-ups ;-(

- Stuckinarut


(This post was edited by stuckinarut on Apr 10, 2015, 12:10 PM)


Zhris
Enthusiast

Apr 10, 2015, 12:11 PM


Views: 9717
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Shows how little testing I have done since adding phase2, it shouldn't cause any problems as is, but please add the following code on line 253:


Code
<$handle_input_errors>; # discard headings.


Chris


stuckinarut
User

Apr 10, 2015, 3:05 PM


Views: 9703
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris:

Back online just briefly, and it seems our previous postings kinda "crossed in the mail" so to speak :^)

Will download the new .zip file and make the code line change you indicated. 4 hours of sleep is not cutting it here again, so need a brief "Power Nap"first lest I really screw something up !

I'll then run some tests and report back.

THANKS-THANKS!!!

- Stuckinarut


Zhris
Enthusiast

Apr 10, 2015, 3:19 PM


Views: 9699
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

No problem.

A lot of the tests and summaries you have been working on will unfortunately need to be revisited, hence why I haven't responded much on these, hopefully you'll notice a satisfying general improvement.

I'm off to sleep now too.

Chris


stuckinarut
User

Apr 11, 2015, 12:40 AM


Views: 9664
Re: [Zhris] HASH-O-RAMA Data Processing Problem

(For Thread Readers)

LOTS of PM's & test results being uploaded, but trying to keep the volume down here. Will be reporting details later.

- Stuckinarut


Zhris
Enthusiast

Apr 11, 2015, 10:28 AM


Views: 9636
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

As per our discussions, attached is the update.

Notes:

- I'll start collecting older code versions from now named main-{version}.pl.

- Updated entries.txt included.

- The phases have been adjusted to suit new requirements. phase0 adds a new qid column to entries.txt, you must run this first, then only re-run it if you change entries.txt and don't add the qid in yourself. phase1 generates the weights. phase2 does the rest. You can re-run phase1 and phase2 however you like, errors.txt is now updated as per adjustments. Fundamentally there is now a prep phase, and no longer a seperate phase for adjustments.

- Everything else we discussed implemented, with a few improvements I made at my own discretion.

Regards,

Chris
Attachments: contestcrosschecker.zip (26.5 KB)


stuckinarut
User

Apr 11, 2015, 12:54 PM


Views: 9619
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
- The phases have been adjusted to suit new requirements. phase0 adds a new qid column to entries.txt, you must run this first, then only re-run it if you change entries.txt and don't add the qid in yourself. phase1 generates the weights. phase2 does the rest. You can re-run phase1 and phase2 however you like, errors.txt is now updated as per adjustments. Fundamentally there is now a prep phase, and no longer a seperate phase for adjustments.


Just downloaded the latest and did a quick run. I like the n=0, n=1 & n=2 'prep-to-final' approach which makes sense. That auto-assignment of the QID's now eliminates what would be another potential error-introduction manual step in Excel to do it.

I'm really getting EXCITED about how this is looking now, but must force myself to return to the tax stuff for the rest of the day/early evening. Later on tonight I'll do some updated manual data comparisons as before and PM them to you with comments.

Thanks very much, Chris!

- Stuckinarut


stuckinarut
User

Apr 11, 2015, 2:05 PM


Views: 9603
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hey, Chris:

Had a lenthy incoming phone call, so before returning to my annual dreaded task, decided to play a couple minutes of Hookey with the script and files again :^)

I changed the WTF to 3, and of course a few more CNQ Combos showed up in the Weights & Error log files.

Here are a couple minor tweaks that would also save manual massaging in Excel to come up with some additional totals for Annual comparison purposes.

1. In the scores.txt file, in addition to the existing Total figure, to include Sub-Totals for 'Logs' and 'No Logs' (i.e., the -minus callsigns).

2. One new 'calls123.txt' file that would list any callsigns of Nologs folks sorted by Callsign (ASC) then the CNQ Weight (ASC) for CNQ's with Weights of 1, 2 and 3. This would abe like an INSTANT 'Hit List' of candidates to send an email to verify that they did, in fact, participate - as well as exactly what NAME and QTH they used. I would do this BEFORE making any adjustments using the Error log and new adjustments.txt file. Yes, this would increase potential accuracy in decision making.

Hmmm... now that I think about it, this #2 above *could* just be an addition to the bottom of the existing 'weights.txt' file (below the Total line). If there were a settings entry in the 'userconfig' section, I could easily change this upward to include 4 or 5 if desired. Maybe something like lwt=3 (Low Weight Threshold = 3 would include 1,2 & 3. lwt=5 would include 1,2,3,4 & 5.

Whaddya think?

Thanks!

- Stuckinarut


Zhris
Enthusiast

Apr 11, 2015, 3:43 PM


Views: 9591
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

1) just to be sure, on top of score you want, 1) score_submitted, the total score where call signs had submitted their logs, 2) score_unsubmitted, the total score where call signs had not submitted their logs. I just realised what you meant, I forgot about the total at the bottom, I fully understand. You'll notice 2 contestants at the bottom of scores.txt who scored 0 because they didn't submit their logs, although you provided a category for them.

2) you'll notice a unsubmitted.txt file is generated after phase2 which lists all the log signs of unsubmitted logs. This could be extended to include all the other information you desire. I'm not sure I quite understand why you would want to list where the wtf was i.e. 1, if this is below the wtf threshold then it has been derived to be a CNQ as oppose to unsubmitted and therefore fundamentally a non existant contestant. You can adjust the wtf's in weighs.txt after phase1 if any guenuine entries are below the wtf threshold, and vice versa. Also, the reason unsubmitted.txt has to be generated at phase2 and not phase1 is because it needs the outcome of the weights and any weight adjustments you made in order to derive which are unsubmitted and which are cnqs. For ease, we wouldn't want to add any information to the bottom of weights.txt because this needs to be read back in for phase2. Remember you can re-run phase1 and phase2 however you wish, therefore you can run phase2 to generate unsubmitted.txt, then send emails, then re-run it. If I have misunderstood, could you please explain in more detai.

Regards,

Chris


(This post was edited by Zhris on Apr 11, 2015, 3:46 PM)


stuckinarut
User

Apr 11, 2015, 3:59 PM


Views: 9581
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Yes, the 2 logs with 0 entries were the 2 'Manual' logs last year...1 multiple screen capture .jpg's by one of the 'Bonus' point station, and 1 paper log (my own due to Confuzzzzer logging software problems that year). When things are at the point of being able to run the 2015 stuff, there will be lots of entries in all category logs including #12 and #13 :^)

I don't think I explained well enough about the other file (calls123.txt or something) addition. Will chew on this some better verbiage. Not a major issue - just a time savings to narrowly define and yield a 'Hit List' for email inquiries without having to import into Excel and delete the unwanted entries for that specific (manual) purpose in the processing.

Back to the grind here ;-(

Thanks, Chris!

- Stuckinarut


(This post was edited by stuckinarut on Apr 11, 2015, 4:00 PM)


stuckinarut
User

Apr 12, 2015, 1:23 AM


Views: 9497
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hey, Chris:

Faster to upload the .pdf file here.

It's 0112 local, so likely an error or two in the manual copy & paste anlyais, however it opened my eyes to a new issue.

I omitted the actual scores and only used the first 2 columns from the scores.txt file :^)

Look at all the callsigns NOT included on the 'unsubmitted' list. Some I've annotated - the others are most likely all 'Busted' calls.

IMHO, shouldn't the W6YA & K6VVA calls from the unsubmitted list also show with -minus signs in the (category) scores listing? What criteria is used to produce the unsubmitted list?

Here's what I'm thinking.

1. Omit the -minus entries from the Scores list - only actual log-submitted scores will be copy & pasted into the Results document anyway.

2. Include ALL Callsigns with NO LOGS on the unsubmitted list, and the 'Weight' in an adjacent column.

For stepping manually through the Error log one entry at a time, I just thought of something else that would really help in this task. That would be to add a final column on the right (after the 'WTF' column): NOLOG

So any Calls in the CALLWKD column that are Non-Log Submitters (possibly 'Busted' Calls) would be FLAGGED as 'NOLOG' :^)

This would be very useful especially if an increase in future year's volume of both log and nolog QSOs with unfamiliar callsigns in play.

**** DARN - I've uploaded the .pdf 3 times but it doesn't display. Hold on... I just stuck it up here:

http://www.xgenesis.com/hashorama/totalcallsanalysis.pdf

FYI & Thanks,

- Stuckinrut


(This post was edited by stuckinarut on Apr 12, 2015, 5:53 AM)
Attachments: totalcallsanalysis.pdf (208 KB)


stuckinarut
User

Apr 12, 2015, 6:02 AM


Views: 9433
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Chris:

Got 4 hours of sleep and back at it.

Oooops, just realized why W6YA & K6VVA did NOT show as ~minus (log) entries... they are in CAT 12 & 13 (even though no QSO line log entries were in the entries.txt file).

I went through each of the MORE CALLSWKD NO LOGS list and checked them against the actual CALLSWKD CNQ's in the entries.txt file. Definitely more were 'BUSTED' calls. Corrected several errors I made (as suspected in the 0100 hour).

The .pdf finally showed up to the posting. Just deleted the initial one & uploaded a corrected version (likely at least one error in this at the rate things are going). Also uploaded/overwrote the other location/URL:

http://www.xgenesis.com/hashorama/totalcallsanalysis.pdf

Also attaching the new file to this post - hopefully it will display.

I hope this helps shed some light on a few tweaks needed.

Thanks!

- Stuckinarut


(This post was edited by stuckinarut on Apr 12, 2015, 6:05 AM)
Attachments: totalcallsanalysis.pdf (208 KB)


Zhris
Enthusiast

Apr 12, 2015, 6:05 AM


Views: 9431
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Now that the codes in a good state I will go through your analysis properly and try to explain each situation.

For now in answer to your querys:

- An unsubmitted log is one who's sign has at least one entry in the weights log that was more than or equal to the wtf threshold but did not make any calls across bands.

- w6ya and w6vva have categories associated with them in the categories log, hence why not assigned the default -1.

Regards,

Chris


stuckinarut
User

Apr 12, 2015, 6:07 AM


Views: 9429
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Ahhh... Tnx for clarifying the unsubmitted decision making. Need to chew on this.

Just re-uploaded an additionally corrected .pdf ... I am embarrassed about making so many Errors myself, and need to slowwwww down. Trying to juggle tooooo many plates here at once with an ongoing sleep deficit ;-(

- Stuckinarut


stuckinarut
User

Apr 12, 2015, 6:11 AM


Views: 9428
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem


In Reply To
- An unsubmitted log is one who's sign has at least one entry in the weights log that was more than or equal to the wtf threshold but did not make any calls across bands.


Do you mean the weight threshold for JUST the Callsign, or the entire CNQ ???

Thanks!


stuckinarut
User

Apr 12, 2015, 6:18 AM


Views: 9423
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem


In Reply To

In Reply To
- An unsubmitted log is one who's sign has at least one entry in the weights log that was more than or equal to the wtf threshold but did not make any calls across bands.


Do you mean the weight threshold for JUST the Callsign, or the entire CNQ ???

Thanks!


FYI, I just changed the WTF back to '2' and did a re-run. N4LOV does not show up in the unsubmitted file even though there are '2' entries in the submitted logs. Maybe the unsubmitted file is not updated for re-runs? If so, can this be re-tweaked?

Thanks, Chris.

- Stuckinarut


stuckinarut
User

Apr 12, 2015, 6:27 AM


Views: 9421
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Maybe time to switch back to 'PM' mode? :^)

The more I think about this, I really believe ALL (potential) unsubmitted calls need to show up on the unsubmitted list, but with the 'Weight' included - maybe even in the same format as the weights.txt file, but also with a final NOLOG column added like I suggested for the Error log:


Code
SIGN  NAME  QTH  WTF  NOLOG


WOW... if the weights.txt file could also have the NOLOG column added, then all of these documents would be of great benefit during the manual processing. Really helpful.

*** ESPECIALLY when I have 2 or 3 documents open (or printed out in front of me) to make decisions.

FYI & Thanks,

- Stuckinrut


(This post was edited by stuckinarut on Apr 12, 2015, 6:29 AM)


Zhris
Enthusiast

Apr 12, 2015, 6:29 AM


Views: 9419
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem


Quote
Do you mean the weight threshold for JUST the Callsign, or the entire CNQ ???


Hehe its kind of complicated. As you know the weights log has multiple entries per sign, that is because of the variation in names and qths used. For a particular sign, it only takes 1 line where the wtf >= wtf threshold to be counted as a valid contestant, but all the lines are taken into consideration when deciding which names and which qths and valid. Lets look at the first five entries in the weights log:


Code
K0AC    BILL    IA      1 
K0AD LOCUST MD 1
K0AD LOCUST MN 84
K0EU JOHN CO 1
K0EU KEN CO 112


- K0AC has 1 entry with wtf below the threshold, therefore is an invalid contestant.
- K0AD has 2 entries with 1 entry with wtf above the threshold. Therefore the sign K0AD is a valid contestant, but as a whole the location MD is invalid, therefore only the name LOCUST and location MN will be assigned against it.
- K0EU like K0AD, but only the name KEN and location CO will be assigned against it.

If you decided K0AC is actually a valid contestant, you could change the wtf of 1 to something above the wtf threshold i.e. 9999999 for safety before running phase2. Likewise if you decided K0AD is actually an invalid contestant, make sure all their wtf's are below the wtf threshold i.e. -1 for safety.

I have just gone throough a couple of signs under more callswkd (no logs) and not certain I quite understand the notes in brackets, every one I checked appeared to have been handled as expected.

Regards,

Chris


stuckinarut
User

Apr 12, 2015, 6:32 AM


Views: 9416
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To

Quote
Do you mean the weight threshold for JUST the Callsign, or the entire CNQ ???


Hehe its kind of complicated. As you know the weights log has multiple entries per sign, that is because of the variation in names and qths used. For a particular sign, it only takes 1 line where the wtf >= wtf threshold to be counted as a valid contestant, but all the lines are taken into consideration when deciding which names and which qths and valid. Lets look at the first five entries in the weights log:


Code
K0AC    BILL    IA      1 
K0AD LOCUST MD 1
K0AD LOCUST MN 84
K0EU JOHN CO 1
K0EU KEN CO 112


- K0AC has 1 entry with wtf below the threshold, therefore is an invalid contestant.
- K0AD has 2 entries with 1 entry with wtf above the threshold. Therefore the sign K0AD is a valid contestant, but as a whole the location MD is invalid, therefore only the name LOCUST and location MN will be assigned against it.
- K0EU like K0AD, but only the name KEN and location CO will be assigned against it.

If you decided K0AC is actually a valid contestant, you could change the wtf of 1 to something above the wtf threshold i.e. 9999999 for safety before running phase2. Likewise if you decided K0AD is actually an invalid contestant, make sure all their wtf's are below the wtf threshold i.e. -1 for safety.

I have just gone throough a couple of signs under more callswkd (no logs) and not certain I quite understand the notes in brackets, every one I checked appeared to have been handled as expected.

Regards,

Chris


Thanks for the explanation. Hmmm...I really need to chew on this now. Will get back with you later today via PM mode to reduce the thread traffic with these discussions :^)


Zhris
Enthusiast

Apr 12, 2015, 6:41 AM


Views: 9414
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

You mentioned N4LOV for wtf threshold of 2. Here are its weight entries:


Code
N4LOV   AL      AL      1 
N4LOV CARL AL 1


Both are below the threshold therefore its an invalid contestant. In the current version, you would look through the weight log before phase2 and adjust the weights of any as you see fit.

Chris


(This post was edited by Zhris on Apr 12, 2015, 6:42 AM)


Zhris
Enthusiast

Apr 12, 2015, 6:52 AM


Views: 9409
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem


Quote
The more I think about this, I really believe ALL (potential) unsubmitted calls need to show up on the unsubmitted list


The thing is, based on the wtf the code has already tried its best to decipher who are the potential unsubmitters. The only thing beyond that would be to include EVERYONE else, you should use the weights log to control whos valid and whos not before hand.

With regards to adding a nolog col to the weights log, this could be done by duplicating later logic but I feel this is beyond what the weights log purpose entails, at the stage of creating the weights log it doesn't know which entries are from non submitters, this is the job of the unsubmitters log. You can assume any group of entries in the weights log where 1 of them has a wtf above the proposed wtf threshold will be a valid contestant whether or not they submitted a log.

Chris


Zhris
Enthusiast

Apr 12, 2015, 6:59 AM


Views: 9754
Re: [Zhris] HASH-O-RAMA Data Processing Problem

FYI, with regards to updating wtfs. it might be better to have a wtf_adjusted column in the weights log, therefore you have a record of both the automatic and manual wtfs. Inevitably you have to be careful when re running phases, as if you made adjustments to the weights log then reran phase1, it would overwrite, this is why you should always use --interactive mode and carefully read the messages as to whether particular files are empty or not. Alternatively adjust the filenames in the configuration so that the inputs and outputs named differently and you'll have to do some copying between phases.

Chris


stuckinarut
User

Apr 12, 2015, 7:31 AM


Views: 9752
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
FYI, with regards to updating wtfs. it might be better to have a wtf_adjusted column in the weights log, therefore you have a record of both the automatic and manual wtfs. Inevitably you have to be careful when re running phases, as if you made adjustments to the weights log then reran phase1, it would overwrite, this is why you should always use --interactive mode and carefully read the messages as to whether particular files are empty or not. Alternatively adjust the filenames in the configuration so that the inputs and outputs named differently and you'll have to do some copying between phases.

Chris


Hmmm... more to chew on.

FYI, I kept wondering about the number '27' that showed up in one QSO entry in the NAME column. Went back to the original log that was emailed (and subsequently 'fixed' and re-fixed' to remove unwanted QSO numbers). In my rushed delete/copy/paste work, I found the problem. Just re-uploaded another corrected entries.txt file:

http://www.xgenesis.com/hashorama/entries.txt

To get a better perspective of things and run more tests, the one thing that definitely would help right now is to at least have a NOLOG Column added to the Error log - to the right of the WTF Column, so that *any* Error where the CALLWKD callsign is a Non-Log Submitter is flagged NOLOG. This will also help me to refine the anticipated manual decision/processing to be done here!

I will re-visit the unsubmitted log stuff later after running more tests. I can just setup a new sub-directory here and use the new entries.txt file, and substitute a new main.pl file.

Thanks much!

- Stuckinarut


(This post was edited by stuckinarut on Apr 12, 2015, 7:32 AM)


stuckinarut
User

Apr 12, 2015, 11:39 AM


Views: 9744
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Chris:

Amazing things happen when one gets a bit more sleep.

REGARDING my previous posting:


In Reply To
To get a better perspective of things and run more tests, the one thing that definitely would help right now is to at least have a NOLOG Column added to the Error log - to the right of the WTF Column, so that *any* Error where the CALLWKD callsign is a Non-Log Submitter is flagged NOLOG. This will also help me to refine the anticipated manual decision/processing to be done here!


Due to the issues and weights involving Busted calls/Non-Submitted Logs, here is what makes MUCH MORE SENSE and will be much cleaner:


Code
Instead of NOLOGS, the column should be LOGS and if the CALLWKD Callsign on any line of the Error Log is for any LOGCALL (i.e., a log was submitted), then what gets entered in that column is "LOG".


This will be a quick visual aid in my MERP (Manual Error Research Process).

Thanks !!!

- Stuckinrut


stuckinarut
User

Apr 12, 2015, 12:15 PM


Views: 9737
Re: [Zhris] HASH-O-RAMA Data Processing Problem

A bit of explanation about what I'm now calling MERP.


In Reply To
This will be a quick visual aid in my MERP (Manual Error Research Process).


By having the new LOT column 'flagged', I can group my manual research tasks.

1. If a (submitted) LOG related Error, I will do those as a group and can quickly go through the actual submitted logs based on the sorting order (which was a reason for it :^)

2. If NOT flagged as a (submitted LOG related Error), I will do those as another group. For this group which includes possible email verification steps, I will use:

A. The CNQ weights.txt list which shows multiple CNQ combination errors for the same Callsign

B. An online HAM specific database with Callsign (owner & QTH) info. When a nickname is used, only the QTH info would be valid but a help.

C. For all USA callsigns (the majority in the logs), there is also the FCC Universal Licensing System online (Government) database.

99.9% of the time, if a callsign does NOT show in #B above, it's a plain old bad/busted callsign. Example: As of this posting, when I type in N4AFY, here is what happens:


Code
The search for "N4AFY" produced no results.


"BINGO!!!"

Likely the person's fingers slipped on the keys and added a 'Y' to N4AF's callsign. Of course, there are CNQ combo multiple entries in the weights.txt file for N4AF which is another issue to deal with {SIGH}.

I hope this helps better understand how I plan to use the Error log details schema in my 'MERP'.

FYI,

- Stuckinarut


Zhris
Enthusiast

Apr 12, 2015, 12:35 PM


Views: 9734
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

As per your requests here is another update.

Notes:

- Added LOG column to errors log but I thought I would make it a bit more intuative. Instead of just outputting LOG where the call is valid and has an associated submitted log I have gone with 3 statuses. Let me know if this doesn't satisfy you.
----- '1' = callsign is valid and its associated log was submitted.
----- '0' = callsign is valid but its associated log was not submitted.
----- '-' = callsign is invalid therefore no log available.

- Updated entries.txt

- Included names and qths in unsubmitted log as this may be helpful.

Chris
Attachments: contestcrosschecker.zip (26.8 KB)


stuckinarut
User

Apr 12, 2015, 8:56 PM


Views: 9714
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hello, Chris:

I had it with the tax work for today/tonight, so just installed and took a 'Test Drive' of the latest version.


In Reply To
----- '1' = callsign is valid and its associated log was submitted.
----- '0' = callsign is valid but its associated log was not submitted.
----- '-' = callsign is invalid therefore no log available.


Interesting possibility - will let you know after more use.


In Reply To
- Included names and qths in unsubmitted log as this may be helpful.


Using K2QBN as an example, it shows 'EVAN' as the name, but I remember there was also a CNQ with 'VAN' in the Weights file. Not sure yet how this will be used compared to the Error log which I believe is the Mega Important tool...along with the weights.txt file list which shows me all the CNQ combinations. Yes, the latter is also Mega Important in my tedious MERP tasks.

But let me tell you what is really E-X-C-I-T-I-N-G ... and I almost soiled myself when I realized how COOL (and a productive time-saver). Can you guess?

By putting the QID numbers next to each line in the Error log, I no longer have to go searching through multiple log entries (even in band & time sequence) by LOGCALL. I can now go DIRECTLY to the specific Error log related QSO line I need to scrutinize. Talk about EFFICIENCY !!! Chris, this is BEYOND AWESOME !!!

I don't even have to type the Error log QID... just copy it and with the entries.txt file open, quickly do a CTRL + F followed by a CTRL + V and then pop the ENTER button. "BINGO!!! ... almost like Magic I am whisked right to what I need.

I can't tell you for years how many extra hours were spent flipping through an unwieldly stack of printed-out log sheets in the Nightmare log-checking process. Even later using individual .txt files on the Confuzzzer was a real pain having to close and open separate files. This is going to save a LOT of time. Dunno why I didn't think to consolidate all the logs into one 'Master' log way back when (DUH!).

Thanks sooooooo much for adding the QID's to both the Error Log and the Phase 0 assignment of QID's to each of the QSO lines in the entries.txt file !!!

- Stuckinarut


Zhris
Enthusiast

Apr 12, 2015, 11:18 PM


Views: 9703
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

No problem at all.

I have actually changed the LOG codes to suitable labels instead:
- 1 is now submitted
- 0 is now unsubmitted
- - is now invalid


Quote
Using K2QBN as an example, it shows 'EVAN' as the name, but I remember there was also a CNQ with 'VAN' in the Weights file.


Remember at phase2, after the weights log was generated at phase1, only names and qths that are >= wtf threshold will be included ( the same information as the scores log ). VAN has a wtf of just 1.

phase2 being the core automation phase ignores anything it deciphered to be invalid, afterall thats automation for you. You should always keep this in mind, the logs generated by phase2 are automated outcomes, don't expect anything the algorithm deciphered to be "useless" to be included.

The log that should be most useful to you is therefore the weights log, since this allows you to fine tune the outcome of phase2. Perhaps we do need to consider extending info in the weights log and / or breaking it up into groups and / or generating other consolidation logs at phase1, enabling you to make the right decisions before phase2. We have already discussed this in some detail, but I don't think we really finalised any ideas.

Also the QID's do indeed make lookups super quick, I used to look at an error call sign, then forget it by the time I opened up the entries, or atleast confuse it with one similar.

One other thing I was thinking was in the scores log under the scores column, it might be nice to do "score/max score", where max score would be the score they would have gotten if they made no errors whatsoever, then perhaps a percentage column i.e. ( ( 100 / max_score ) * score ). I'm not sure if this would be useful to you, but a lower percentage would be a good indication of who is the most error prone, while those with 100% percentage deserve a reward, perhaps a billion bonus points ;-). But of course those with a score of 0/0 would get 100%, and those who didn't submit many entries might be notably less error prone.

Regards,

Chris


(This post was edited by Zhris on Apr 12, 2015, 11:41 PM)


stuckinarut
User

Apr 13, 2015, 12:03 AM


Views: 9684
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
Also the QID's do indeed make lookups super quick, I used to look at an error call sign, then forget it by the time I opened up the entries, or atleast confuse it with one similar.


Yes, 'been there, done that - still doing that', but in my excitement, it was not until I took a needed R 'n R break that I realized the actual QSO line from the *other* (CALLWKD) station is also what needs checking. And at this point of 'Beta Testing', having the ability to do a rapid copy & paste into the FIND dialog box on the entries.txt page to zip right to the CALLWKD station QSO details would be great. Especially, for investigating the 'NIL' Errors.

So another idea came to mind: QID2 (or 'Son of QID' :^) If we added one more CWID (Call Worked ID) column to the far right of the Error log, this could accomplish the mission like with the CID. Of course, a CWID number would only display IF the Error line involved a CALLWKD by another log submitter. Yes, this would be Awesome(2) and save even more time.

FYI, when the logs are originally submitted, there is a "Claimed Score". If you look at that PHP Form URL again I PM'd you, I had already planned to use the this data as part of (manually) doing more tests... to make sure nothing fell through the cracks. Of course, the submitter's math skills (or lack thereof) have sometimes not been accurate. Not to be 'the pot calling the kettle black' since I've made more than my share lately by rushing too quickly.

I had actually considered including the 'Claimed Score' as another column in the categories.txt file. Ohhhh... now I'm getting another idea for something very useful, but will chew on it a bit more.

Based partly on the K2QBN example of CNQ weights of 3 and 1, I'm pretty certain already that a WTF of 3 would be the minimum. I also want to run tests at 4 and 5. What is also coming out of this is likely to be a tweak to next year's Rules.

The VE3|ON, VE4|MB and XE|DX situation is something I am still mulling over in consideration of the objectives of helping others also increase their on-air *and* logging accuracy vs. blanketly Auto-Mulliganizing any QSOs. This is a tricky one.

- Stuckinarut


stuckinarut
User

Apr 13, 2015, 12:23 AM


Views: 9678
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
I have actually changed the LOG codes to suitable labels instead:
- 1 is now submitted
- 0 is now unsubmitted
- - is now invalid


This is still baffling me a bit. I understand 1 and 0, but for example:


Quote
427 K6NV VE3KI 80M 0242 RICH ON CNQ 1<3 -


VE3KI did not submit a log, but a determination cannot be made as to whether the QSO is actually 'invalid' or not until investigated.

Checking the primary online HAM database, VE3KI is 'Richard' in ON(tario). Most likely he uses 'Rich' on-the air. This type of QSO may have to be treated as a 'Unique' vs. an invalid CNQ, and I'm continuing to chew on these situations in terms of disposition.

Hmmm... it might be more helpful to only flag the '1' lines as LOG (much clearer than trying to remember pseudo-codes).

Looking at this one flagged '0' ...


Quote
385 K6DGW W1NN 40M 0213 HAL SC CNQ 1<3 0


The MERP will be the same as for those currently flagged '1' as compared with a rapid QID/CWID review of actual submitted QSO lines and both these could be left blank. The only entries which would show in the CWID column would be LOG ('1' status), and a corresponding CWID column entry of the ... hmmm... it would have to be the actual QID of the CALLWKD station now that I think about it, to be able to go directly there to check, right?

Shutting down for the night now at 0030 local.

Thanks!

- Stuckinarut


(This post was edited by stuckinarut on Apr 13, 2015, 12:25 AM)


Zhris
Enthusiast

Apr 13, 2015, 1:21 AM


Views: 9671
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

FYI, I chucked in 3 new columns in the scores log just for my own interest, SCOREMAX and ACCURACY as per my last post, and BONUSES is a count of the number of bonus stations worked. its interesting to see the accuracy trend. I will remove if no use to you, but you might be interested with the result for now.


Quote
So another idea came to mind: QID2 (or 'Son of QID' :^) If we added one more CWID (Call Worked ID) column to the far right of the Error log, this could accomplish the mission like with the CID. Of course, a CWID number would only display IF the Error line involved a CALLWKD by another log submitter. Yes, this would be Awesome(2) and save even more time.


I think I understand what you mean, currently the qid represents which line the error triggered on. A cwid would be the qid of the call worked. If so, this wouldn't work how you think, since a call might have multiple qso entries for the sign, name and qth combination, therefore a single qid couldn't be derived.


Quote
This is still baffling me a bit. I understand 1 and 0, but for example:

Quote
427 K6NV VE3KI 80M 0242 RICH ON CNQ 1<3 -

VE3KI did not submit a log, but a determination cannot be made as to whether the QSO is actually 'invalid' or not until investigated.


phase2, the automation phase, decided it was invalid. If you had adjusted its weight before phase2 appropriately, then the outcome could have been different. OR, you could have run phase2, analysed the automated outcome, adjusted its wtf, then re-run phase2. Could you confirm that this makes sense, its important you understand the purposes of phase1 and phase2. This brings me back to the point of perhaps constructing more detailed logs at phase1 if the weights log doesn't provide what you need as of yet. Fundamentally though, you should have used the weights log to decipher that "VE3KI RICH ON" is valid, you can't rely on the outcome of phase2 until the phase1 weights log is acceptable, or appropriate adjustments in the adjustments log have been made.

Finally, I did mention I changed 1, 0 and - to appropriate labels of submitted, unsubmitted and invalid, but have replaced with LOG for 1 / submitted only and other "statuses" are left blank.

Code is attached.

Regards,

Chris
Attachments: contestcrosschecker.zip (27.1 KB)


stuckinarut
User

Apr 13, 2015, 9:39 AM


Views: 9622
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Chris:

Getting down to the tax deadline so will have to get back with you later or tomorrow regarding some of the items in your last posting.

I downloaded the latest version and did a real quick updated comparison of the scores. VERY interesting with the other data columns you added. Note that I put the word BEFORE in Bold Red Font on the top line :^)

http://www.xgenesis.com/hashorama/compare_lqpck5_13April2015.pdf

At a quick glance, most scores appear to now be in-the-zone, and differences are mostly the result between 'Mulligans' that were given for minor errors in 2014, and just spot-checking logs that were not contenders for any of the category awards. FYI, Cat #11 is 'Checklog' only, and along with #12 & #13 not eligible for any awards.

Even without any Adjustments being possibly made from scrutiny of the Error log, FORTUNATELY even at a first pass, the individual category winners remain the same {Major SIGH Of Relief}. The sequences are a bit out of order in some cases, but for quick-glance purposes, I thought you would find this update very interesting.

BTW, I discovered one more (non-critical) left-over item in the entries.txt file ... an extraneous "CA" to the right of QSO #1257 after WQ5L RAY MS ... now deleted in my file here.

Umh, 'lqpck5' is a local sub-directory for the latest version :^)

- Stuckinarut


(This post was edited by stuckinarut on Apr 13, 2015, 9:41 AM)


stuckinarut
User

Apr 13, 2015, 9:51 AM


Views: 9619
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Ohhh... I did just notice something odd with this one you might want to check out:


Code
2	N8XX	11000	12000	92%	0	JACK	MI


In my 2014 RESULTS, I showed N8XX at 13,000 points - his log had 13 QSOs (x 1000 = 13,000) "Claimed", which *should* have been the MAXimum possible score before any Errors in the scores.txt file list. Not sure why only 12,000 shows as the MAX, unless it has to do with him using the name 'IGOR' for the 1st QSO. IMHO, that should not make any difference in determining MAXimum possible.

FYI,

- Stuckinarut


(This post was edited by stuckinarut on Apr 13, 2015, 10:02 AM)


Zhris
Enthusiast

Apr 13, 2015, 8:00 PM


Views: 9505
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Thanks for posting the comparison, its looking better than a few comparisons ago.

I couldn't locate the lqpck5 subdirectory, could you post the full link.


Quote
In my 2014 RESULTS, I showed N8XX at 13,000 points - his log had 13 QSOs (x 1000 = 13,000) "Claimed", which *should* have been the MAXimum possible score before any Errors in the scores.txt file list. Not sure why only 12,000 shows as the MAX, unless it has to do with him using the name 'IGOR' for the 1st QSO. IMHO, that should not make any difference in determining MAXimum possible.


You should have increased the wtf in the weights log for igor to above the wtf threshold. Ok lets simply this, lets remove one of the more extreme checks (which this issue tripped up on) and assume all the log side of things as oppose to call are somewhat valid and not ignore logs below the wtf threshold. Inevitably you can still control the validity on a per error basis via the adjustmenets log.

Delete or comment out line 522, which should be:


Code
next if $log_wtf < $wtf_threshold;


This may fix numerous issues where previously you hadn't used the weights log to its full potential. Note igor will now appear in the list of names in the unsubmitted and scores logs (this can be changed if need be), but any calls to this name will continue to CNQ error because its < wtf threshold.

Chris


(This post was edited by Zhris on Apr 13, 2015, 8:22 PM)


stuckinarut
User

Apr 13, 2015, 9:06 PM


Views: 9486
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi, Chris:
'
The lqpck5 sub-directory in on my local machine & not accessible - I create a separate install location for each update :^)

Made the change and yes, the scores.txt file updated but not the unsubmitted.txt file. Re-ran n=1 and n=2 twice but no dice.

Still chewing on another idea here that needs to 'ferment' a bit :^)

FYI,

- Stuckinarut


Zhris
Enthusiast

Apr 13, 2015, 9:12 PM


Views: 9483
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Ah I thought you meant you uploaded it to your site.

What difference were you expecting in the unsubmitted log, if with regards to n8xx, they submitted a log therefore will not appear there.

Chris


stuckinarut
User

Apr 13, 2015, 10:04 PM


Views: 9472
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
Ah I thought you meant you uploaded it to your site.


Sorry... I should have clarified this before ;-(


In Reply To
What difference were you expecting in the unsubmitted log, if with regards to n8xx, they submitted a log therefore will not appear there.


Not sure exactly what you mean here. Was just getting ready to post something for you, so will have to re-read yours once or twice and go look at the files :^)

Here's what I've been chewing on:


Code
http://www.xgenesis.com/hashorama/QWID_Possibilities.pdf


Also mulling over exactly how I am going to treat some of the errors when I shift into 'Adjustments' mode {HUGE SIGH HERE}.

Will be offline for a while but check back in later.

FYI,

- Stuckinarut


stuckinarut
User

Apr 13, 2015, 10:16 PM


Views: 9470
Re: [Zhris] HASH-O-RAMA Data Processing Problem

One more tidbit... I pulled this info from an LCR (Log Checking Report) from a Contest I did operating from the U.S. Virgin Islands in 2004. I was young then (60) and operated 44 out of the total 48 hour contest weekend.

After being off the air for about 30 years, I only had about 2 months to get my Morse Code speed back up to 35 to 38 Word Per Minute to compete in this 'International' event. Ended up in the Top 10 :^) No way at 71 now would I attempt to do another 44 hour thing - I started hallucinating at about 38 hours and that was 11 years ago.

With 4603 raw (Gross) QSOs, and copying exchanges faster than I could type them, I wasn't sure how things would end up.

This will give you an idea of how one of the 'Major' HAM Contest sponsors deals with log-checking. Not sure the columns will align properly.


Code
SCORE SUMMARY     160   80    40    20    15    10       All 
------------- ---- ---- ---- ---- ---- ---- ----
Raw QSOs = 476 437 835 661 1285 909 = 4603
Dupe QSOs = 4 2 5 6 6 7 = 30
Busted QSOs = 3 8 8 12 11 11 = 53
Valid QSOs = 469 427 822 643 1268 891 = 4520
Penalty QSOs = 3 8 8 12 11 11 = 53
Final QSOs = 466 419 814 631 1257 880 = 4467
Multipliers = 52 57 58 53 58 55 = 333

Total QSO Points = 13401
Final score = 4462533
Error rate = 1.2% (100 X (Busted QSOs / Duped QSO total))

There were 30 dupes found. You might have marked these dupes. For
electronic logs, all dupes are just removed with no penalties.

Total incorrect calls = 35. These will be removed from your log
with an additional penalty of 35 QSOs.

There were 18 bad cross check QSOs removed.
You had 18 NILs. A penalty of 18 QSOs will be assessed.

You had 78 unique calls. These QSOs are not removed.


I freaked out when I saw the 1.2% Error Rate, until some of my fellow Contest Club members said that was actually very good (3% to 5% Errors was apparently fairly common).

The proverbial $64,000 question now is how strict (or lenient) I am going to be with my own event log-checking which was pretty 'hang-loose' before :^)

FYI,

- Stuckinarut


Zhris
Enthusiast

Apr 13, 2015, 11:22 PM


Views: 9457
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,


Quote
Made the change and yes, the scores.txt file updated but not the unsubmitted.txt file. Re-ran n=1 and n=2 twice but no dice.



In Reply To
What difference were you expecting in the unsubmitted log, if with regards to n8xx, they submitted a log therefore will not appear there.

Not sure exactly what you mean here. Was just getting ready to post something for you, so will have to re-read yours once or twice and go look at the files :^)


Assumably you thought the unsubmitted log should have changed when you deleted the line of code. I was wondering what change you were expecting. I didn't expect it to change since this code deletion is seperate to the algorithm that determines unsubmitters.

I see that you are aware the qwid cannot be straight forwardly looked up for nil and qth errors. The time thing might work out, you could have a configurable range and then provide the range of qids i.e. cwid = 1001-1007, atleast giving you a rough position. You mentioned both sides of an error transaction should show up in the error log, I can't be certain of exactly what you mean, but remember if someone didn't return the call then that person won't have an error marked since they didn't log it but as a result won't get any points for it anyway.

Cool to see how the results were laid out for a contest you took part in. I like the summary sheet they provided you with. We could create a directory called summaries, and create a summary file for each log sign, with the same sort of layout but could extend it to display their entries from the entries log and errors from the errors log, maybe even their weights too. This is probably of low priority and something we can think about later.

I've gradually become more concerned that parts of what I've implemented aren't exactly what you thought I implemented (mostly with regards to weighting). Inevitably as and when I wrote sections of the code, I used my own initiative to fill in the gaps, I've tried to explain some of these, but ideally you would be able to read back and understand the code. Basically there are quite a few issues that you raise that from my point of view are working as expected. We could go back to using a fresh entries.txt file which has a much smaller sample of data that reflects every possible scenario, with a manually written expected output, making sure we are both on the same track. Maybe I am in actual fact over thinking this and that you fully understand and are perfectly happy with the way its going so far. Its been difficult for both of us to account for all the discrepencies along the way, this is not a straight forward task and its been good practice!

Regards,

Chris


(This post was edited by Zhris on Apr 13, 2015, 11:25 PM)


stuckinarut
User

Apr 16, 2015, 6:12 PM


Views: 9163
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi, Chris:

Sorry...was intensely focused on making the tax deadline yesterday. Mission accomplished, but it left me drained. I 'crashed and burned' last night ;-(

Yes, this is not a run-of-the-mill project/task scenario, and I greatly appreciate you hanging in there with me !!!

Today I started to take a new 'Test Drive' with this year's 2015 data, but ran into a few challenges that necessitate re-downloading and re-formatting some of the Callsign/Category data (the QSO line stuff appears to be OK).

Unfortunately, I need to get all packed up to leave in the morning for an Annual Convention so will be offline until Monday. I plan to use the (normally boring highway) 3 hour drive each way to periodically 'chew' on final decision options as to strictness or leniency in dealing with errors in the script processing. I'll have my weenie digital recorder along to take notes. Hmmm, this will likely be on the way back, as I think I need 3 hours of non-stop music on the way 1st leg of the trip to further de-compress :^)

I'm actually looking forward to several days of No Confuzzzzer, No email, No Robo-Telemarking phone calls, etc. Hopefully I will return with an emptied personal mind-cache to meet the remaining 'challenges' in this matter with a fresh perspective.

Thanks again for all your help !!!

- Stuckinarut


stuckinarut
User

Apr 21, 2015, 9:37 AM


Views: 8612
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi, Chris:

I'll be PM'ing you more specific details hopefully later today, but am uploading a quick "Bird's Eye View" here of what came to mind before leaving for the Convention and subsequent tweaking and re-tweaking done since returning.

Out of the Total 101 errors using WTF=3, this can reduce the "iffy" (questionable) CNQ status count to only 35 !!! With a future increase in overall log submissions and QSO counts, this potential time savings is worth-it's-WEIGHT-in-Gold ('Weight' PUN intentional :^)

The new "Discovery" on how to further reduce the manual PITA of switching between files when investigating and researching errors came about when I switched from using the 1024x768 display main Notebook Confuzzzzer to a seeing things much w-I--d---e----r without having to horizontally scroll on the 23" 1920x1080 flat panel with a recently acquired Desktop for A/V purposes. What a difference...a whole new World!

I've embedded some brief "NOTES" in new consolidated errorlog.csv file approach for quick import into Excel (.csv instead of a Tab/.txt file is necessary for this to work). All the other existing errors.txt, weights.txt, etc. files will still serve a useful purpose.

Oh darn... the .jpg won't upload (exceeds 250KB). Hold on... OK, you can get it here:


Code
www.xgenesis.com/hashorama/errorlog_partial_screencapture_1920px.jpg


(EDIT): I think that should be sorted DESC instead of ASC for the CNQ Weight/CNQ combinations in the Notes (for the CALLWKD LOG CNQ WEIGHTS column/field entries) ;-(

- Stuckinarut


(This post was edited by stuckinarut on Apr 21, 2015, 9:44 AM)


stuckinarut
User

Apr 21, 2015, 9:51 AM


Views: 8606
Re: [Zhris] HASH-O-RAMA Data Processing Problem

This one-line sample errorlog.csv file entry file is small enough to upload here :^)

NOTE: Importing the .csv into Excel will require a one-time adjustment of the column widths to accommodate the longer data sections, of course.

Ooops... I left out a comma separator for the empty 'LOG' field in this example.


(This post was edited by stuckinarut on Apr 21, 2015, 10:01 AM)
Attachments: errorlog.csv (0.12 KB)


Zhris
Enthusiast

Apr 21, 2015, 5:29 PM


Views: 8556
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

I've made the changes you described, but as a result I ideally need to refactor the code then re-test. Basically the new error sort order required an intermediate data structure therefore am considering putting error and score construction in their own "_input" functions.

I will have it ready at some point tomorrow.

Chris


stuckinarut
User

Apr 21, 2015, 6:11 PM


Views: 8546
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Thank you so very much, Chris. I'll be in 'Standby' mode.

- Stuckinarut


Zhris
Enthusiast

Apr 22, 2015, 3:21 PM


Views: 8422
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi Eric,

Please see PM for latest version and notes. I don't want to post it on main because I didn't have time to fully refactor it just yet.

Chris


stuckinarut
User

Apr 22, 2015, 9:22 PM


Views: 8363
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Thanks - PM'd you some 'Beta Test' Feedback details & questions.

- Stuckinarut


Zhris
Enthusiast

May 15, 2015, 1:18 PM


Views: 3111
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

100th reply!!!

Just wanted to put closure on this. We moved to discussing purely over PM to save inundating this thread. After much deliberation and many modifications the task was completed, there is plenty of room for improvement but we are happy with the result. I'll leave it to Eric if he wishes to post the final version, a whopping 1566 lines long!

Chris


(This post was edited by Zhris on May 15, 2015, 1:19 PM)


stuckinarut
User

May 17, 2015, 10:22 PM


Views: 3081
Re: [Zhris] HASH-O-RAMA Data Processing Problem


In Reply To
100th reply!!!

Just wanted to put closure on this. We moved to discussing purely over PM to save inundating this thread. After much deliberation and many modifications the task was completed, there is plenty of room for improvement but we are happy with the result. I'll leave it to Eric if he wishes to post the final version, a whopping 1566 lines long!

Chris


I can't begin to thank Chris (Zhris) enough for taking such a keen interest in this matter, and I am very grateful for all of his diligence and persistence!

My count here is actually 340 PM's exchanged between us (143 Received and 197 Sent :^) Maybe this is some kind of record?

The plan now is to hopefully do a short video to show (and explain) exactly how everything finally works in conjunction with an Excel sheet Error Log. I may also need to redact the LOGCALL (column) Callsigns to 'protect the innocent' {GRIN}.

In the interim, I'm attaching/uploading the last main file v19 .pl code 'Masterpiece' that was finalized.

Again, my sincere appreciation for everything, Chris !!!

- Stuckinarut (a/k/a Eric)


(This post was edited by stuckinarut on May 17, 2015, 10:36 PM)
Attachments: main.pl (57.8 KB)