CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
HASH-O-RAMA Data Processing Problem

 

First page Previous page 1 2 3 4 5 Next page Last page  View All


stuckinarut
User

Feb 22, 2015, 7:35 PM

Post #1 of 102 (14548 views)
HASH-O-RAMA Data Processing Problem Can't Post

For 9 years I have sponsored a small Ham Radio on-air Contest event and manually done all tedious log checking of several thousand contacts from submitted logs. At 71 now my eyesight is not the best, and I've struggled to try and figure out the rest of how to do the bulk of the log checking in Perl.

Since all submitted log (QSO Lines) are consolidated into one Master (listQ.txt) file in random order by the Submitter, I'm Stuck-In-A-Rut trying to figure out whatever Arrays & Loops & Code can hopefully make this all work.

Below is my current Work-In-Progress that has led me to a Brick Wall of how to finish the actual log checking after all data is entered into a Hash. I've included some specifics as Comments within the Script as FYI. After the script will be the very short Test Data file (listQ.txt) I used.


Code
#!/usr/bin/perl 

use strict;
use warnings;

# -------------------------------------------
=begin comment

Sample listQ.txt file entry & data structure:

QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZFO BILL CA

(Etc., etc. - ALL QSO (Contact) lines for ALL participants appended in one large listQ.txt
file for processing.

In the above example:

$band = 3542 (actually a frequency which is converted to "80M" in the script);
$logcall = W7WHY
$logname = Tom
$logmult = OR
$callsign = N6ZFO
$callname = BILL
$callmult = CA

The "CW", (DATE) and (TIME) Columns/Fields are NOT needed for log checking.

Some of the submitted entries contain Upper/Lowercase Text which must be converted
to all UPPERCASE (unless processing is not case-sensitive).

1. Each "QSO:" line contains the Contact log information to be checked
2. Any Duplicate contacts on the *same* frequency band (40m or 80m) are NOT allowed
3. Possible Errors are:
A. Other stations logged were NOT actually worked/contacted on the frequency band(s)
indicated, which could also be due to incorrectly copied and/or logged Callsigns.
B. If the Callsign logged was correct, then an incorrect match or spelling of the NAME or
"MUL" (Abbreviations for USA States, Canadian Provinces/Territories or missing, etc.).
4. Station Callsigns worked/contacted who did not submit a log need to be included in the
errors.csv file and "manually" dealt with in the final (Non-Perl) processing stages.
5. The desired Summary objectives are described at the end of the existing code so far.

=end comment

=cut
# -------------------------------------------

my $Q_list;
my $line;
my $qso;
my $logtime;
my $band;
my $logcall;
my $logname;
my $logmult;
my $callsign;
my $callname;
my $callmult;

# IMPORTANT NOTE:
# Somehow All TEXT data needs converting to UPPER CASE before Hash entry

open $Q_list, '<', 'listQ.txt' or die "Cannot open listQ.txt: $!";
while (my $line = <$Q_list>) {
chomp $line;
$line =~ s/\r//g; # removes windows CR characters
$line =~ s/\s+$//; # removes trailing white spaces

if ( $line =~ m/^QSO.\s+([0-9]+).*\s+([\w]{4})\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)/ ) {

if ( $1 ne '' ) {
$band = $1;
if ($band =~ m/^7[\d]+/) {
$band = '40M';
} elsif
( $band =~ m/^3[\d]+/ ) {
$band = '80M';
}
}

# Not used in processing at this time (PUN!) but resolved Error messages
$logtime = $2;

# Print Hash data (to verify testHash entries only)

print $band;
$logcall = $3;
print $logcall;
$logname = $4;
print $logname;
$logmult = $5;
print $logmult;
$callsign = $6;
print $callsign;
$callname = $7;
print $callname;
$callmult = $8;
print $callmult;
print "\n";
}
}


# -------------------------------------------
=begin comment

Here are the Hash data entries from the Sample/Test listQ.txt file, which
included Errors (on purpose) in 3 entries for Log Check Error "Testing".
No log was submitted by $callsign W6NV so there are no $logcall entries
in the Master Consolidated listQ.txt file.

40MW7WHYTomORN6ZFOBILLCA
40MW7WHYTomORW9REMIKEIN
80MW7WHYTomORN6ZFBILLCA <- (Callsign should be N6ZFO)
80MW7WHYTomORW6NVOLICA
80MW7WHYTomORW9REMIKEIN
40MW9REMIKEINW7WHYTOMOr
40MW9REMIKEINN6ZFOBILLCa
80MW9REMIKEINN6ZFOBILCa <- (Name should be BILL)
80MW9REMIKEINW7WHYTOMOr
80MW9REMIKEINW6NVOLICa
40MN6ZFOBILLCAW7WHYTOMOR
40MN6ZFOBILLCAW9RRMIKEIN <- (Callsign should be W9RE)
40MN6ZFOBILLCAN2NLDAVEFL
80MN6ZFOBILLCAW9REMIKEIN
80MN6ZFOBILLCAW7WHYTOMOR
80MN6ZFOBILLCAW6NVOLICA

The Log Checking Summary objectives are to Append to the following .csv files
which will be Imported into an Excel spreadsheet for further processing.

1. logscores.csv

A. File Header: LOGCALL,QSOS,MULT,NAME
B. (DATA) QSOs is a "Count" of Verified 2-Way Logged QSOs (Contacts) WITHOUT Errors
only got submitted logs checked.

2. logerrors.csv

A. File Header: LOGCALL,CALLSIGN,ERRORTYPES

B: "CALLSIGN" is the station "claimed" or reported in a submitter's (LOGCALL), but
was some kind of Error(s) due for one or more of the following reasons:

C. ERRORTYPES <- with the Concatenated data separated by one space
NIL (CALLSIGN) - LOGCALL did NOT appear in the log of the CALLSIGN worked or mistyped
NIL (BANDQSO) - The CALLSIGN station log did not show a QSO/Contact on the indicated band
INVALID (NAME) - NAME did not exactly match the (CALLSIGN) NAME in the Submitter's (LOGCALL) log
INVALID (MULT) - MULT did not exactly match the (CALLSIGN) MULT in the Submitter's (LOGCALL) log
DUPE (BANDQSO) - A "Same Band" Duplicate QSO (not allowed)

=end comment

=cut
# -------------------------------------------

# END OF SCRIPT IN PROGRESS


Test listQ.txt data lines (including intentional Errors):


Code
QSO:  7040 CW 2015-01-22 0200 W7WHY           Tom        OR  N6ZFO           BILL       CA 
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Any assistance would be most gratefully appreciated.

Thank you!

-Stuckinarut


FishMonger
Veteran / Moderator

Feb 23, 2015, 9:03 AM

Post #2 of 102 (14534 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

I don't have time right now to work up a full solution, but as a starting point, I'd drop that regex and instead use a simple split statement to extract your fields.

Example:

Code
while (my $line = <$Q_list>) { 
chomp $line;
if ( $line =~ m/^QSO.\s+([0-9]+).*\s+([\w]{4})\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)/ ) {
print Dumper($1,$2,$3,$4,$5,$6,$7,$8);

my @fields = (split(/\s+/, $line))[1,4..10];
print Dumper \@fields;
last;
}
}


Outputs:

Code
$VAR1 = '7040'; 
$VAR2 = '0200';
$VAR3 = 'W7WHY';
$VAR4 = 'Tom';
$VAR5 = 'OR';
$VAR6 = 'N6ZFO';
$VAR7 = 'BILL';
$VAR8 = 'CA';
$VAR1 = [
'7040',
'0200',
'W7WHY',
'Tom',
'OR',
'N6ZFO',
'BILL',
'CA'
];



(This post was edited by FishMonger on Feb 23, 2015, 9:04 AM)


stuckinarut
User

Feb 23, 2015, 11:21 PM

Post #3 of 102 (14507 views)
Re: [FishMonger] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Finally back online briefly - thanks for your reply & info, FishMonger.

I used the lonnnnng REGEXP to eliminate ending up with the Mode (CW), Date & Time fields getting plugged into the mix of things since those are not used in this particular log checking/validation process.

Still trying to figure things out - I feel like a 'deer-in-the-headlights' although a new thought did come to mind.

Starting with the first record, then trolling through ALL of the remaining record entries in the Hash, IF that (logcall) $VAR in the record matches the the (logcall) $VAR in the next record, nothing would NOT be matched or process. This is because it would be *that* person's own log entry, so the process would move onward until a DIFFERENT (logcall) $VAR was reached. It is the (callsign) $VAR (the 2nd 'callsign' in a record that needs matching along with the other noted fields to that specific (callsign) IF it is a (logcall) $VAR in any subsequent records.

Ohhhh...sorry... I'm really having trouble trying to explain things. Hmmm. I'll try this.

In what you kindly posted, $VAR3 (W7WHY) is considered the (logcall). $VAR6 (N6ZFO) is the (callsign). ONLY if N6ZFO is in a subsequent record as $VAR3 will any matching/log check processing take place for that record...and so on down the line for all the (callsign/$VAR6) entries in the W7WHY/$VAR3 records based upon the established log-checking & error criteria.

So, IF $VAR3 in a record matches $VAR6 in the next record, then the other $VAR checks/matches take place. HOWEVER, I need to retain some form of this to change *any* 'frequencies' to simply either the 40M or 80M band designators:

if ( $1 ne '' ) {
$band = $1;
if ($band =~ m/^7[\d]+/) {
$band = '40M';
} elsif
( $band =~ m/^3[\d]+/ ) {
$band = '80M';
}
}

I could be wrong, but I'm thinking after each record log-check match (or no match) takes place on the $VARS that the results data would be appended/written to the .csv files before moving on to the next record.

How to pull it all together remains a Mystery and my head is spinning again ;-(

-Stuckinarut


Chris Charley
User

Feb 24, 2015, 11:48 AM

Post #4 of 102 (14482 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Just a suggestion but couldn't you get a logger program (for free or at small cost) that is available?


stuckinarut
User

Feb 24, 2015, 2:24 PM

Post #5 of 102 (14470 views)
Re: [Chris Charley] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi, Chris:

We all use 'logger' programs. It is the special 'Cabrillo' (format) file output from everyone's loggers that get submitted which the Perl REGEXP parses into the data needed for the 'log-checking' nightmare part of things. In other words, 'validating' the logged (and typed) contact information. From that point, I'm still 'Stuck-In-A-Rut' {SIGH}.

-Stuckinarut


(This post was edited by stuckinarut on Feb 24, 2015, 2:25 PM)


Zhris
Enthusiast

Feb 24, 2015, 8:48 PM

Post #6 of 102 (14447 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I have had a look through your requirements and done a little research on contesting. At this stage I wanted to ask how you propose on deciphering which data is invalid if for example they mistyped their logcall. Do you have a log of all participants containing their logcall, logname and logmult to check against?

Chris


stuckinarut
User

Feb 25, 2015, 12:36 AM

Post #7 of 102 (14439 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Zhris:

In doing the tedious manual processing, I used to give everyone at least one 'Mulligan' for what we term a 'Busted Call' or for a misssspellllud 'Name' or 'Mult' (location). Now the focus includes something called 'Accuracy' :^)

So if the (logcall field) station works a (callsign field) station, when checking the latter's callsign in his or her log, IF the (logcall) station's log being validated yields either a mistyped ('Busted') callsign, that contact will not count.

Dealing with any missing (but legit) callsigns accurately typed but for stations who did NOT submit a log, well, that's where I still have to cut a bit of slack and also do additional 'Manual' processing to include checking against a list of 'Unique' callsigns worked/contacted from ALL log submitters. Normally these few non-log submitter callsigns do show up in multiple logs and are legit.

I didn't want to further complicate the basic log-checking needs of a single Perl script which is what will eliminate the majority of the many hours of manual labor.

A new thought came to mind to try and chart out some Pseuo-Code for what I now think *might* work in a flow of processing.

-Stuckinarut


Zhris
Enthusiast

Feb 25, 2015, 9:03 AM

Post #8 of 102 (14413 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I have thrown together a rough script which constructs an appropriate data structure then extends it while cross checking.

Please note the following:
- It is not a complete solution, but potentially the base of one.
- I have assumed that the logcall, logname and logmult are always valid since they are the operators own details, in actual fact it is the first seen entries logname and logmult that is checked against throughout. This assumption makes it alot easier when checking for invalid callsigns, callnames and callmults.
- Upon reading your comments, I wasn't entirely clear on each error, particularly the NIL ones, therefore they may not be correct.
- For now / for simplicity, only one possible error is marked against each entry.
- For now, in order to run the script standalone, I have put your input data in the DATA block at the end of the script and dumped the resultant data structure to stdout instead of writing to the relevant output CSVs.
- I have included a potential non submitters hash. The idea is everytime an invalid callsign is discovered, it increments it in the hash by 1. Those with the highest count at the end are more likely to be non submitters as oppose to invalid.

Apologies if it doesn't work quite as expected, I was unable to put much time in it for now, but I'm sure you will be able to describe any issues.


Code
use strict; 
use warnings;
use Data::Dumper;

my $configuration =
{
casesensitive => 0,
input_path => 'listQ.txt',
output_scores_path => 'logscores.csv',
output_errors_path => 'logerrors.csv',
};

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $contestants =
{

};

my $potential_non_submitters =
{

};

while ( my $line = <DATA> )
{
$line =~ s/\s+$//;

my ( $freq, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { uc $_ unless $configuration->{casesensitive} }
(split( ' ', $line ))[1,5..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
callsign => $callsign,
callname => $callname,
callmult => $callmult,
};
}

while ( my ( $logcall, $contestant ) = each ( %$contestants ) )
{
while ( my ( $band, $entries ) = each ( %{$contestant->{bands}} ) )
{
for my $entry ( @$entries )
{
# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# validate entry.
do { $entry->{errors}->{'NIL (BANDQSO)'} = 1; $potential_non_submitters->{$entry->{callsign}}++; next } unless ( defined $contestants->{$entry->{callsign}} ); # invalid callsign.
do { $entry->{errors}->{'DUPE (BANDQSO)'} = 1; next } if ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ); # duplicate entry.
do { $entry->{errors}->{'INVALID (NAME)'} = 1; next } unless ( $entry->{callname} eq $contestants->{$entry->{callsign}}->{logname} ); # invalid callname.
do { $entry->{errors}->{'INVALID (MULT)'} = 1; next } unless ( $entry->{callmult} eq $contestants->{$entry->{callsign}}->{logmult} ); # invalid callmult.
do { $entry->{errors}->{'NIL (CALLSIGN)'} = 1; next } unless ( grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ); # no return entry.
}
}
}

print Dumper $contestants, $potential_non_submitters;

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
$VAR1 = { 
'N6ZFO' => {
'logname' => 'BILL',
'logmult' => 'CA',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'IN',
'callsign' => 'W9RE',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {
'NIL (CALLSIGN)' => 1
},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W9RR',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'FL',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'N2NL',
'callname' => 'DAVE'
}
]
}
},
'W7WHY' => {
'logname' => 'TOM',
'logmult' => 'OR',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'N6ZF',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
},
{
'seen' => 1,
'callmult' => 'IN',
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'CA',
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE (BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE (BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'IN',
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
]
}
},
'W9RE' => {
'logname' => 'MIKE',
'logmult' => 'IN',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'INVALID (NAME)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BIL'
},
{
'seen' => 1,
'callmult' => 'OR',
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL (CALLSIGN)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
}
]
}
}
};
$VAR2 = {
'N6ZF' => 1,
'W9RR' => 1,
'N2NL' => 1,
'W6NV' => 3
};


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 9:24 AM)


stuckinarut
User

Feb 25, 2015, 10:24 AM

Post #9 of 102 (14392 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Zhris:

WOW... this looks promising and I appreciate your help! Must leave for a good part of the day, but will jump back into the code later today/tonight and take a 'Test Drive'.

Thanks much!

-Stuckinarut


Zhris
Enthusiast

Feb 25, 2015, 11:36 AM

Post #10 of 102 (14385 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I've had a little more time to work on this. I have adjusted the script to make it a little more readable and have roughly generated the desired score and error outputs ( strings for now ). I haven't "fixed" anything I invisage being incorrect, instead am awaiting the feedback from your test drive ;):


Code
use strict; 
use warnings;
use Data::Dumper;

local $, = "\t";
local $\ = "\n";

#####

# fh begin.

my $output_scores_str = '';
my $output_errors_str = '';

my $input_fh = \*DATA; # 'listQ.txt'
open my $output_scores_fh, '>', \$output_scores_str; # 'logscores.csv'
open my $output_errors_fh, '>', \$output_errors_str; # 'logerrors.csv'

#####

# init.

my $case_sensitive = 0;

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $contestants = { };

my $potential_non_submitters = { };

#####

# first sweep ( load input data into hash ).

while ( my $line = <$input_fh> )
{
$line =~ s/\s+$//;

my ( $freq, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { uc $_ unless $case_sensitive }
(split( ' ', $line ))[1,5..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
callsign => $callsign,
callname => $callname,
callmult => $callmult,
};
}

#####

# second sweep ( process hash, generate logs ).

print $output_scores_fh 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $output_errors_fh 'LOGCALL', 'CALLSIGN', 'ERRORTYPES';

while ( my ( $logcall, $contestant ) = each ( %$contestants ) )
{
my $verified = 0;

while ( my ( $band, $entries ) = each ( %{$contestant->{bands}} ) )
{
for my $entry ( @$entries )
{
# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# validate entry.
if ( not defined $contestants->{$entry->{callsign}} ) # invalid callsign.
{
$entry->{errors}->{'NIL(BANDQSO)'} = 1;

$potential_non_submitters->{$entry->{callsign}}++;
}
elsif ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ) # duplicate entry.
{
$entry->{errors}->{'DUPE(BANDQSO)'} = 1;
}
elsif ( $entry->{callname} ne $contestants->{$entry->{callsign}}->{logname} ) # invalid callname.
{
$entry->{errors}->{'INVALID(NAME)'} = 1;
}
elsif ( $entry->{callmult} ne $contestants->{$entry->{callsign}}->{logmult} ) # invalid callmult.
{
$entry->{errors}->{'INVALID(MULT)'} = 1;
}
elsif ( not grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ) # no return entry.
{
$entry->{errors}->{'NIL(CALLSIGN)'} = 1;
}

# log errors if any, or increment verified count.
if ( keys %{$entry->{errors}} )
{
print $output_errors_fh $logcall, $entry->{callsign}, keys %{$entry->{errors}}; # todo: errors better as list not hash.
}
else
{
$verified++;
}
}
}

# log score.
print $output_scores_fh $logcall, $verified, $contestant->{logmult}, $contestant->{logname};
}

#####

# dump.

{
local $, = "\n";
print Dumper $contestants, $potential_non_submitters;
}

#####

# fh end.

close $input_fh;
close $output_scores_fh;
close $output_errors_fh;

print $output_scores_str;
print $output_errors_str;

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
$VAR1 = { 
'N6ZFO' => {
'logname' => 'BILL',
'logmult' => 'CA',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {},
'callsign' => 'W9RE',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {
'NIL(CALLSIGN)' => 1
},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W9RR',
'callname' => 'MIKE'
},
{
'seen' => 1,
'callmult' => 'FL',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'N2NL',
'callname' => 'DAVE'
}
]
}
},
'W7WHY' => {
'logname' => 'TOM',
'logmult' => 'OR',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'N6ZF',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {},
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE(BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'DUPE(BANDQSO)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
},
{
'seen' => 1,
'callmult' => 'IN',
'errors' => {},
'callsign' => 'W9RE',
'callname' => 'MIKE'
}
]
}
},
'W9RE' => {
'logname' => 'MIKE',
'logmult' => 'IN',
'bands' => {
'80M' => [
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'INVALID(NAME)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BIL'
},
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(BANDQSO)' => 1
},
'callsign' => 'W6NV',
'callname' => 'OLI'
}
],
'40M' => [
{
'seen' => 1,
'callmult' => 'OR',
'errors' => {},
'callsign' => 'W7WHY',
'callname' => 'TOM'
},
{
'seen' => 1,
'callmult' => 'CA',
'errors' => {
'NIL(CALLSIGN)' => 1
},
'callsign' => 'N6ZFO',
'callname' => 'BILL'
}
]
}
}
};

$VAR2 = {
'N6ZF' => 1,
'W9RR' => 1,
'N2NL' => 1,
'W6NV' => 3
};

LOGCALL QSOS MULT NAME
N6ZFO 2 CA BILL
W7WHY 3 OR TOM
W9RE 2 IN MIKE

LOGCALL CALLSIGN ERRORTYPES
N6ZFO W7WHY NIL(CALLSIGN)
N6ZFO W6NV NIL(BANDQSO)
N6ZFO W9RR NIL(BANDQSO)
N6ZFO N2NL NIL(BANDQSO)
W7WHY N6ZF NIL(BANDQSO)
W7WHY W6NV NIL(BANDQSO)
W7WHY N6ZFO DUPE(BANDQSO)
W7WHY N6ZFO DUPE(BANDQSO)
W9RE N6ZFO INVALID(NAME)
W9RE W6NV NIL(BANDQSO)
W9RE N6ZFO NIL(CALLSIGN)


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 11:58 AM)


stuckinarut
User

Feb 25, 2015, 7:23 PM

Post #11 of 102 (14354 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Zhris (Chris):

AWESOME... I think this is pretty close to working!

I realized that I forgot to make an Error in spelling for one of the mults, and noticed the Name & Mult were missing for W6NV in one of the __DATA__ lines, so re-tweaked and ran the script again. The Mult Error checking worked!


Code
#####  

# STUCKINARUT CHANGED W9RE MULT TO "IL" FOR W7WHY 40M QSO AT 0201 TO TEST MULT SPELLING CHECK :^)
# STUCKINARUT ADDED NAME=OLI AND MULT=CA FOR W6NV 80M QSO WITH N6ZFO AT 0235 FOR COMPLETE LOG ENTRY

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


I printed out the Summaries and started manually checking each (DATA) log entry to verify the actual LOGCALL QSOs and go through the ERRORS list. Partway through I realized adding 2 more fields to the ERRORS list would be of great help in this process to check things against the (Master) __DATA__ list. And especially since the total number of actual QSOs to be checked will be about 2,500 and a likely high number of Error items to Manually check/research after-the-fact.

For the Errors list if you could tweak the fields to be as follows that would be very, very helpful:

LOGCALL CALLSIGN BAND TIME ERRORTYPES

So actually the TIME field will indeed play an important part after all {SIGH} and help go line by line to verify and validate the 'Test' data. If all is well, I'll add more __DATA__ records including intentional Errors and run more tests.

Also writing the $VAR2 list with just the (Callsigns) to a .txt file would be helpful for the manual checking process. Sorting the first (Logcall) field in Alpha-Numeric for all 3 files would I see now also facilitate faster checking and save time.

From a quick check in the Master EXCEL file of logs & data submitted, the Total Unique 'Logcalls' (submitted log callsigns) is about 32, but another 26 'Callsigns' actually worked in the event (but they did not submit logs). Some people just 'show up for the fun' but are not interested in doing paperwork :^)

Looking toward the future of hopefully 70 logs and maybe 4,000 contacts/log entries in this one-hour event, hopefully the processing approach here will handle that amount of data?

My original plan to roll through the submitted log entries one by one overlooked the fact that log entries further down the food chain would also have to check every entry starting from the first forward. DUH! (on me).

Thanks very much for your help, Chris !!!

-Stuckinarut


Zhris
Enthusiast

Feb 25, 2015, 10:03 PM

Post #12 of 102 (14344 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Your feedback has been valuable in helping me understand your thoughts. It would be cool to develop this as much as possible in order to eliminate as much manual checking as possible. Inevitably, if you had a list of contestants, the entire process could probably be automated.

As you have come to realize, in order to be able to check an entry against any other possible entry, we had to load all the data into a hash during the "first phase". The amount of memory this hash uses shouldn't be of concern unless it contained millions of entries. If it ever became a problem, there are tweaks that could be made to improve memory consumption.

Make certain that you try to cover every possible scenario in your sample input data, just in case we have missed something!

Also, if theres anything you don't understand from reading the code, feel free to raise your concerns, it may be necessary for you to one day make adjustments.

As per your post I have made the following changes:
- Replaced the data block with the adjusted data block you provided.
- Implemented sorting in numerous places.
- Added band and time fields to error log.
- Generated potential non submitters log, including weight field as this is important.

As per my own initiative, I have also made the following changes:
- Implemented an ignore hash. Any callsigns in this hash will be ignored. Once you have deciphered which logcalls belong to non submitters, you could insert them into this hash, then rerun the script.
- Fixed case sensitive map bug.

Perhaps still to do:
- If the ignore hash is handy, this could be constructed from another input file.
- At this time, only one possible error is logged against each entry in precedence as per the if / elsif conditions. You may wish to allow multiple possible errors, although there are discrepencies in doing so.
- Instead of an error log, you may wish to incoporate errors / mulligans into the final score ( QSOS ) for each logcall.
- Minor improvements to code, i.e. replace greps with List::*Util functions etc.
- Replace development filehandles with those pointing to your input and output logs.


Code
use strict; 
use warnings;
use Data::Dumper;

#local $/ = "\r\n";
local $, = "\t";
local $\ = "\n";

#####

# init.

my $case_sensitive = 0;

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $ignore =
{
#W6NV => 1,
};

my $contestants = { };

my $potential_non_submitters = { };

#####

# fh open.

my $output_scores_str = '';
my $output_errors_str = '';
my $output_nonsub_str = '';

my $input_fh = \*DATA; # open my $input_fh, '<', 'listQ.txt' or die "cannot open 'listQ.txt': $!";
open my $output_scores_fh, '>', \$output_scores_str; # open my $output_scores_fh, '>', 'logscores.csv' or die "cannot open 'logscores.csv': $!";
open my $output_errors_fh, '>', \$output_errors_str; # open my $output_errors_fh, '>', 'logerrors.csv' or die "cannot open 'logerrors.csv': $!";
open my $output_nonsub_fh, '>', \$output_nonsub_str; # open my $output_nonsub_fh, '>', 'lognonsub.csv' or die "cannot open 'lognonsub.csv': $!";

# headings.
print $output_scores_fh 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $output_errors_fh 'LOGCALL', 'CALLSIGN', 'BAND', 'TIME', 'ERRORTYPES';
print $output_nonsub_fh 'LOGCALL', 'WEIGHT';

#####

# first phase ( load input data into hash ).

while ( my $line = <$input_fh> )
{
# remove any whitespace on the end of the line ( spaces, carriage return, newline ).
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
bands => { },
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
time => $time,
callsign => $callsign,
callname => $callname,
callmult => $callmult,
seen => 0,
errors => [ ],
};
}

#####

# second phase ( process hash, generate logs ).

for my $logcall ( sort keys %$contestants )
{
my $contestant = $contestants->{$logcall};

# instead of verified counter, could count number of error free entries before logging.
my $verified = 0;

for my $band ( sort keys %{$contestant->{bands}} )
{
my $entries = $contestant->{bands}->{$band};

for my $entry ( sort { $a->{callsign} cmp $b->{callsign} } @$entries )
{
# skip if in ignore list.
next if exists $ignore->{$entry->{callsign}};

# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# verify entry.
if ( not defined $contestants->{$entry->{callsign}} ) # invalid callsign.
{
push @{$entry->{errors}}, 'NIL(BANDQSO)';

$potential_non_submitters->{$entry->{callsign}}->{$logcall}++;
}
elsif ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ) # duplicate entry.
{
push @{$entry->{errors}}, 'DUPE(BANDQSO)';
}
elsif ( $entry->{callname} ne $contestants->{$entry->{callsign}}->{logname} ) # invalid callname.
{
push @{$entry->{errors}}, 'INVALID(NAME)';
}
elsif ( $entry->{callmult} ne $contestants->{$entry->{callsign}}->{logmult} ) # invalid callmult.
{
push @{$entry->{errors}}, 'INVALID(MULT)';
}
elsif ( not grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ) # no return entry.
{
push @{$entry->{errors}}, 'NIL(CALLSIGN)';

# todo: set "yes return entry" in relevant entry in $contestants->{$entry->{callsign}}->{bands}->{$band}, then don't have to pointlessly reverse check later.
}

# log errors if any, or increment verified count.
if ( @{$entry->{errors}} )
{
print $output_errors_fh $logcall, $entry->{callsign}, $band, $entry->{time}, @{$entry->{errors}};
}
else
{
$verified++;
}
}
}

# log score.
print $output_scores_fh $logcall, $verified, $contestant->{logmult}, $contestant->{logname};
}

#####

# third phase ( process potential non submitters hash, generate potential non submitters log ).

# reformat potential non submitters hash into callsign => count / weight. We incorporated contestants own logcall to ensure they can't skew
# the result if they consistantly use an invalid callsign via different bands or duplicate entries i.e. equal weighting / one logcall per callsign reported.
$_ = keys %$_ for ( values %$potential_non_submitters );

for my $callsign ( sort { $potential_non_submitters->{$b} <=> $potential_non_submitters->{$a} } keys %$potential_non_submitters )
{
my $weight = $potential_non_submitters->{$callsign};

# log potential non submitter.
print $output_nonsub_fh $callsign, $weight;
}

#####

# fh close.

close $input_fh;
close $output_scores_fh;
close $output_errors_fh;
close $output_nonsub_fh;

# print
#print Dumper $contestants;
#print Dumper $potential_non_submitters;
print $output_scores_str;
print $output_errors_str;
print $output_nonsub_str;

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
LOGCALL	QSOS	MULT	NAME 
N6ZFO 2 CA BILL
W7WHY 2 OR TOM
W9RE 2 IN MIKE

LOGCALL CALLSIGN BAND TIME ERRORTYPES
N6ZFO N2NL 40M 0222 NIL(BANDQSO)
N6ZFO W9RR 40M 0221 NIL(BANDQSO)
N6ZFO W6NV 80M 0235 NIL(BANDQSO)
N6ZFO W7WHY 80M 0231 NIL(CALLSIGN)
W7WHY N6ZFO 40M 0200 DUPE(BANDQSO)
W7WHY N6ZFO 40M 0200 DUPE(BANDQSO)
W7WHY W9RE 40M 0201 INVALID(MULT)
W7WHY N6ZF 80M 0231 NIL(BANDQSO)
W7WHY W6NV 80M 0232 NIL(BANDQSO)
W9RE N6ZFO 40M 0221 NIL(CALLSIGN)
W9RE N6ZFO 80M 0231 INVALID(NAME)
W9RE W6NV 80M 0249 NIL(BANDQSO)

LOGCALL WEIGHT
W6NV 3
N6ZF 1
W9RR 1
N2NL 1


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 10:47 PM)


stuckinarut
User

Feb 25, 2015, 10:59 PM

Post #13 of 102 (14323 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Chris:

I am in AWE of what you have come up with so quickly. Indeed another humbling experience to realize how much I have yet to learn (or even attempt to).

I've printed the revised output to whack away at. Before my previous post, I was actually chewing on an idea similar to your 'ignore hash' method to (hopefully) eliminate some of the further tedious manual work. I can't thank you enough for your interest in willing to also do some possible 'embellishments'. I'm thinking you must be related to 'Santa Claus' ???

As I chomp through the latest output checks, I'll jot down any thoughts about additional 'automation' possibilities and come back in a day or two.

In the interim, here's a bit more information about the particular event (a short on-air 'Contest' - a/k/a 'RadioSport' event, in the form of a 'QSO Party'). Most of the log submitters are 'Regulars' in what we call 'Thursday Night Contesting' ... a real rip-snortin' high-speed Morse Code venue a la 35 to 40WPM (words-per-minute) in speed. I mean, we are talking 'Lightning Fast' stuff here. The real crazy thing is that most of us are real 'old men' --- mostly 60 and above (I just turned 71).

My little event started as a fun on-air way to celebrate my 50th Anniversary Year as a licensed 'Ham' operator. And, as a way to 'give back' to this group of highly skilled CW (Morse Code) RadioSport guys - a 'niche' part of Ham Radio, I give subscriptions and/or renewals to the main RadioSport magazine as Awards for the 10 different entry categories. These are mostly by Geographic area, however there is a separate category for 'NOOBS' (1st-timers to either my Shindig or the Thursday Night Contesting events). A 'Green Power' category is for those running their rigs strictly off battery, solar and/or wind power.

BUT WAIT...THERE'S MORE...

To try and 'level-the-playing-field' and give all the 'Little Pistol' station folks an opportunity to compete based upon 'SKILL' and not by size and power/antennas of stations, those with BIG antennas must reduce their output power based on a list of criteria by antenna (gain). It's been pretty amazing to hear some of the Big Gun TNC 'Regulars' show up with much weaker signals (these are Honest guys). The 'Rules' are kinda complicated. Everyone probably deserves some kind of award for just reading them :^)

Before first posting my outreach for assistance, I had several added 'automation' desires in the mix, but decided to try and simplify things to the bulk of the manual labor part so didn't include them. Your kind willingness to explore some enhancements is muchly appreciated. I've been trying to figure out a way to use Perl to help with this annual 'Nightmare' task, but just could not pull it together myself {OBVIOUSLY}. Although the on-air event is a real blast, I always DREAD the after-the-party manual log checking of several thousand contacts ;-(

Your help here so far has been like a 10,000 pound Gorilla being lifted off my back (and head) !!!

For now I'll leave you with one enhancement that would definitely save some more time and frustration.

FYI, each valid QSO (Contact) is worth 1,000 Points. In the 2nd year, to help encourage folks to contact EVERYONE possible, I introduced 3 (Volunteer/Secret) stations that would yield an additional 5,000 'Bonus Points' if contacted (well, make the 2 ... everyone also gets 5,000 BP's for working me :^) We all get to run higher power in order to give the best chance at success. Nobody knows who the other 2 are until the even starts - they and myself use one of my 'nicknames' in our exchanges. But here's the catch... even if all 3 of us are worked on both bands (40M & 80M), the Bonus Points only apply for ONE QSO (Contact) for each of us. In other words, the MAXIMUM number of Bonus Points is 15,000.

My intial thought was to have a separate list to input into the mix with the 3 callsigns, but then things got foggy as I tried to figure out how to credit the Bonus Points only ONCE for the 3 of us. Since 2 of the Bonus Station callsigns are different every year, being able to use a simple .txt file with the 3 callsigns (all of which have log entries submitted in the Master list), and also automate this additional manual labor step would be extremely helpful. The final step in 'after-the-party-is-over' work here is a final recap write-up of the event and a listing of all log submitter scores and category winners. I do a *LOT* of copy-and-pasting {SIGH}.

OK... sorry for the 'Novel' here, but hopefully additional insights into the 'WHY' of the 'HASH-O-RAMA' plea for help.

Will return in a day or two.

Thanks again, Chris !!!

-Stuckinarut


stuckinarut
User

Feb 25, 2015, 11:17 PM

Post #14 of 102 (14320 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Oh, Chris, I forgot to add a thanks for your 'Brilliant' initiative:

> Generated potential non submitters log, including weight field as this is important.

This will be EXTREMELY useful in distinguishing at-a-glance any single 'One-Off' busted callsigns vs. the Non-Log-Submitter callsigns who will likely show up in multiple logs & QSO lines. Those I will need to add back in to individual scores as 'Valid' (Mulligan-like) Contacts. The 'Honor System' factors into things in this area.

-Stuckinarut


stuckinarut
User

Feb 26, 2015, 8:08 AM

Post #15 of 102 (14290 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hey, Chris...

RE:
> It would be cool to develop this as much as possible in order to eliminate as much manual checking as possible

After about one hour of sleep, I woke up with a bunch of thoughts racing through my mind for further automation & manual labor reduction. But I was too groggy to type on a keyboard so took

a small digital recorder back to bed with me to dictate into. I'd turn it off and try to go back to sleep, and then a few minutes later grab it and start recording another thought. It's a

good thing I'm presently 'in-between wives & dogs', or I would have been kicked out of the sack and sent to a real 'Dog House'. I mean, who wants some guy babbling into a recorder in the

dark next to you at 2AM when you're trying to sleep? {GRIN}.

Now I'm up just starting to transcribe the info. Cudda-Shudda installed the copy of DRAGON several months ago when I bought it to partially 'Automate' this particular process ;-(

BTW, thanks for clarifying that a lot of stuff can go into the Hash. Clever that you inserted several same-band 'DUPE' records into the test __DATA__ to check that very important function.

Nice indeed!

RE:
>Inevitably, if you had a list of contestants, the entire process could probably be automated.

Unfortunately, unless everyone who planned to participate would 'Register' in advance, there is really no way to have a simple 100% accurate list. Actually, formal log submissions were a

bit down this year due to time-schedule conflicts. 4 or 5 of the Annual 'Regular' participants sent me emails in advance apologizing that they would have to miss the event. That was nice

of them.

So here was a previous thought I had about this matter. Now that I've learned you can 'sweep' and 're-sweep' the Hash:

1. Once the Master Log data is read into the Hash, the next step would be to:

A. Do a 'sweep' of ALL the multiple (logcall field) callsigns and produce a cleaned list of only the 'Uniques'.

B. Do another 'sweep' of ALL the multiple (callsign field) callsigns and produce a cleaned list of only the 'Uniques'.

C. Compare the two lists and if my logic is correct, this could yield (similar to subtracting List A *from* list B), a 'Total Uniques' list of not only just ALL the actual total on-air

participants worked who did NOT submit logs, but also include any/all 'Busted' Callsigns/Errors). This will actually serve a valuable purpose along with your Brilliant idea to include a

'Weight' factor.

Hmmm...in retrospect it might have been less confusing to designate the two QSO line 'callsign' fields as 'logcall' and 'callworked' ?

Anyway, IF this new list of both "One-Off" Errors *and* Non-Log Submitted valid/invalid'callsign' Uniques also includes the BAND, TIME, NAME & MULT data, some additional log-checking

'automation' benefits can be realized subject to a possible 'Mulligan' factor in the mix.

Assuming that any callsign entry on this list that appears at least 2 or maybe 3 times would NOT be a "One-Off" Error, but rather an actual legitimate callsign/station QSO (contact), this

could further reduce the manual labor process. What would really be EL SLICK-O, is to be able to change what I'll term a 'threshold' level or value of the 'Weight' for its use in log

checking tests. I'll try and explain.

IF one of these callsigns has a weight of 2 (occurences in the overal scheme of things), then any (logcall) QSO claims for the (callworked) would process the same as if it had been one of

the formal (logcall) submission QSO data lines. Adding the BAND, NAME & MULT data would make this possible. The TIME field would be for use in 'Manual' check/reconcilations. Only those

with a weight of 1 would end up on the Error list. Being able to switch the 'threshold' use value between 1, 2 and 3 would be very valuable in 'Beta Testing' the output! A value of 3 would

likely be the most accurate, but if 2 works and reduces the list of Errors to manually check, that would be like 'BINGO' :^) Not sure if I've explained this well.

NOTE: The TIME field is NOT used in actual log checking, because not everyone's Confuzzzer clocks have been set to the same precise WWV time. There could be a range of + or - several

minutes, but since the event operating time is structured for 30 minutes on the 40M Band first, then 30 minutes on 80M, TIME is really a Moot issue - except to make the Manual Error

checking stuff go faster when visually looking through a submitted log.

RE:

>LOGCALL WEIGHT
>W6NV 3
>N6ZF 1
>W9RR 1
>N2NL 1

Maybe use 'NOLOG' instead of 'LOGCALL' ?


Ahhhh... here's another idea...

Something like a 'Control Panel' or 'CONFIG OPTIONS' section near the top of the script to quickly change any option values or turn stuff ON or OFF (i.e., = true {or} = false) whether

involving either the Hash data or other .txt file data to be read in or not. Here's what came to mind for this item:

# ############################################
# ############################################
# ############## CONFIG OPTIONS ##############

# Threshold for Uniques list in log-checking
# (NO Log Submitted -or- Busted Callsigns)
# Values: 1, 2 or 3 (occurences) on the list

nologthreshold = 2

# Next CONFIG OPTION - Blah Blah Blah

etc., etc.

# ############################################

OK, this is pretty lengthy so I'll post the further 'Automation' thoughts separately later after transcribing everything.

Thanks!

- Stuckinarut


stuckinarut
User

Feb 26, 2015, 8:22 AM

Post #16 of 102 (14289 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

OOOPS... sorry...

RE:


Quote
# Threshold for Uniques list in log-checking
# (NO Log Submitted -or- Busted Callsigns)
# Values: 1, 2 or 3 (occurences) on the list

nologthreshold = 2

A value of 1 would be ONLY a firm/fixed 1 occurrence.

A value of 2 would be at least 2 (or more) occurrences.

A value of 3 would be at least 3 (or more) occurrences.


What do you think ???

Thanks!

-Stuckinarut


(This post was edited by stuckinarut on Feb 26, 2015, 8:34 AM)


Zhris
Enthusiast

Feb 26, 2015, 8:59 AM

Post #17 of 102 (14280 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Just letting you know that I'm out all day today and will look through this when I get back later.

Best regards,

Chris


stuckinarut
User

Feb 26, 2015, 10:31 AM

Post #18 of 102 (14271 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hey, Chris...

>Just letting you know that I'm out all day today and will look through this when I get back later.

Thanks - I'll be gone part of today myself. In the interim, I'll post more info for 'Automation' mix of things.

I chugged down more Caffeine and think I can explain this 'nologthreshold' function a bit better (as I now see it). It is a bit more tricky than I thought and needs some tweaking.

RE: (I changed the first field header :^)


Code
NOLOG	WEIGHT  
W6NV 3
N6ZF 1
W9RR 1
N2NL 1


I'll use just W6NV in different scenario examples and 'occurrences'(the weight #) which would be need to be based on not just the (callsign), but the IDENTICAL NAME & MULT data. So I think these two fields

would need to be added to the primary ERROR output as well:


Code
RE: 
LOGCALL CALLSIGN BAND TIME ERRORTYPES
N6ZFO W6NV 80M 0235 NIL(BANDQSO)
W7WHY W6NV 80M 0232 NIL(BANDQSO)
W9RE W6NV 80M 0249 NIL(BANDQSO)


I'm now thinking maybe NOLOG might be a better simple ERRORTYPE if one of these (callsigns) can NOT be matched to a (submitted) log.

As things stand now, the weight for W6NV is 3.


Code
NOLOG	WEIGHT 
W6NV 3


If the ERROR output is tweaked to flag a 'NOLOG' (submitted) attempted match with an Error code of just 'NOLOG", we would see now see this:


Code
LOGCALL	CALLSIGN	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 0235 OLI CA NOLOG
W7WHY W6NV 80M 0232 OLI CA NOLOG
W9RE W6NV 80M 0249 OLI CA NOLOG


So 'WHAT IF' there were actual (accuracy) Errors (NAME or MULT) that legitimately should invalidate a QSO with a NOLOG Submitted station and be considered in the 'Weight' for a 'Threshold' level value?

Consider this possibility:


Code
LOGCALL	CALLSIGN	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 0235 OLI CA NOLOG
W7WHY W6NV 80M 0232 OLI CA NOLOG
W9RE W6NV 80M 0249 ORI CA NOLOG


Combining the (callsign) with the NAME & MULT, would yield a different (tweaked) result:


Code
NOLOG	WEIGHT	NAME	MULT 
W6NV 2 OLI CA
W6NV 1 ORI CA


HA!!! If the 'nologthreshold' value were set as 2 (which would be at least 2 'or more' occurrences in the actual log checking', this would *reasonably* suggest that any QSOs with W6NV and OLI CA are most

likely valid and should be counted as Valid QSOs and NOT show up in the Error output (somewhat of a 'Mulligan' approach, but would save considerable manual labor).

The W6NV and ORI CA entry was definitely an error and therefore an invalid QSO. If the 'nologthreshold' value were set as 3 ('or more' occurrences), then all 3 of the W6NV QSOs would continue to show in the

ERROR output and not be (automatically) validated in the log checking.

Another scenario:


Code
LOGCALL	CALLSIGN	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 0235 OXI CA NOLOG
W7WHY W6NV 80M 0232 OLI CA NOLOG
W9RE W6NV 80M 0249 ORI CA NOLOG


HA! Now we would have:


Code
NOLOG	WEIGHT	NAME	MULT 
W6NV 1 OLI CA
W6NV 1 OXI CA
W6NV 1 ORI CA


So if the 'nologthreshold' value is 2 (or more) -or- 3 (or more) occurrences, then ALL of the W6NV's QSOs would bounce to the ERROR output.

A NEW THOUGHT JUST FLOATED IN... to first calculate all 'weights' for the 'nolog' callsigns based on (callsign)+NAME+MULT, to be able to include a 'weight' figure in the ERROR output data:


Code
LOGCALL	CALLSIGN	WEIGHT	BAND	TIME 	NAME	MULT	ERRORTYPES  
N6ZFO W6NV 80M 1 0235 OXI CA NOLOG
W7WHY W6NV 80M 1 0232 OLI CA NOLOG
W9RE W6NV 80M 1 0249 ORI CA NOLOG


For testing purposes at different 'nologthreshold' values, this would give an INSTANT bird's eye (or eagle's eye) view of things as well as 'Mulliganizing' decisions :^)

I'll keep chewing on this more. Chances are at least 10 QSOs will be made by the majority of NOLOG (callsigns). One fast way to get a handle on a possible Accuracy/Error factor for the NAME & MULT fields

would be to do a Count of QSO entries for the list of NOLOG (callsigns) and sort Descending by Quantity of QSOs made... based on including the NAME & MULT fields. Yes... I think that would give a quick picture

of the reality of things.

I need to finishing re-tweaking the Master Logs file because some folks used a logging software module that included QSO Serial Numbers column fields (used in many other RadioSport events). So once this is

completed, maybe it would be helpful to get the actual file to you (somehow) so you will have the complete real nitty gritty to work with?

A looming new question at this point is IF there are 'manual' adjustments made, what creative way might there be to then re-run Summary output to include the changes {HUGE SIGH}.

Thanks, Chris.

-Stuckinarut


Zhris
Enthusiast

Feb 26, 2015, 1:35 PM

Post #19 of 102 (14258 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Firstly, thank you for the background information on amateur radio sports. It helps me to understand the purpose of this task and enables me to provide more valuable suggestions. I have also looked at your website and familiarized myself with the rules ( I look forward to that reward you "promised" just for reading them! ).

When the time comes, we certainly can adjust to code to incooporate bonus points for contacting up to three volunteer secret stations across both bands. It has also become apparent that you have to do alot of pre and post processing i.e. pre processing includes adjusting contestants logs into the Cabrillo format, and post processing includes using excel to decipher the top scorers under each category. Perhaps this could be handled by our Perl script too.

With regards to namespacing, the whole logcall and callsign can be pretty baffling, afterall they are the same thing, but represent whether they are the transmitter or the receiver. I would have thought the namespaces of transmittercall for logcall and receivercall for callsign might be more appropriate, but what do I know.

Like alot of problems, the more you think about them, the more you realise their complexity. Before I read your reply, I had realised that we hadn't incoporated name and mult into the mix of things when attempting to automate the process of deciphering valid but unsubmitted logcalls. One example of many scenarios that must be considered is, if two contestants contacted an unsubmitted callsign, but logged conflicting callnames and/or callmults, we have no idea which is valid. I could also imagine a group of cheaters agreeing to log non existant callsigns in order to increase their weighting. Etc etc etc.

I propose we break the problem up into two phases.

- The first phase produces a log of all valid contestants callsigns, callnames and calllogs, whether they submitted their entries or not. Based on yours and my own ideas this can be automated to some level of accuracy, but an option to manually check and adjust this log is necessary.
----- The automated process will generate the log of all contestants using configurable threshold(s).
----- The manual process will generate a log of confirmed contestants, and unconfirmed contestants with relevant information in order to decide to keep or eliminate by eye. The resultant log will need to be in the same format as the one generated by the automated process.
- Now that we have the log generated in the first phase, the second phase of actually scoring each contestant will be a piece of cake.

This two phase process also simplifies the process and eliminates the need for the ignore and potential non submitters hashes.

I really need to get my head round phase one and re-read your concepts / ideas. For now I have thrown together code which handles confirming easily confirmable contestants, then lays out the unconfirmed contestants in a potentially suitable structure ready to be processed automatically or logged for manual processing. I will try to figure out the automation side of things.


Code
use strict; 
use warnings;
use Data::Dumper;

#local $/ = "\r\n";
local $, = "\t";
local $\ = "\n";

#####

# fh open.

my $string_output_scores = '';
my $string_output_errors = '';

my $handle_input_entries = \*DATA;
open my $handle_output_scores, '>', \$string_output_scores;
open my $handle_output_errors, '>', \$string_output_errors;

# headings.
print $handle_output_scores 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $handle_output_errors 'LOGCALL', 'CALLSIGN', 'BAND', 'TIME', 'ERRORTYPES';

#####

# init.

my $configuration =
{
handle_input_entries => $handle_input_entries,
#handle_input_contestants => $handle_input_contestants,
#handle_output_contestants => $handle_output_contestants,
handle_output_scores => $handle_output_scores,
handle_output_errors => $handle_output_errors,
case_sensitive => 0,
band_lookup => { 3 => '80M', 7 => '40M' },
automate_unconfirmed => 1,
automate_threshold => 2,
};

my $phase_dispatch =
{
1 => \&phase1,
2 => \&phase2,
};

#####

# phase.

my $phase = 1; # $ARGV[0];

$phase_dispatch->{$phase}->( $configuration );

#####

# fh close.

close $handle_input_entries;
close $handle_output_scores;
close $handle_output_errors;

# print.
#print $string_output_scores;
#print $string_output_errors;

#####

# functions.

sub phase1
{
my ( $configuration ) = @_;

my $handle_input_entries = $configuration->{handle_input_entries};
my $case_sensitive = $configuration->{case_sensitive};
#my $band_lookup = $configuration->{band_lookup};
my $automate_unconfirmed = $configuration->{automate_unconfirmed};

my $hash = { };

while ( my $line = <$handle_input_entries> )
{
# ignore blank or comment lines.
next if $line =~ m/^(\s*#|\s*$)/;

# remove any whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# lookup band via frequency.
#my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# log* are automatically confirmed.
$hash->{confirmed}->{$logcall} = { logname => $logname, logmult => $logmult } unless exists $hash->{confirmed}->{$logcall};

# call* are not confirmed yet.
$hash->{unconfirmed}->{$callsign}->{$callname}->{$callmult}++;
}

for my $callsign ( keys %{$hash->{unconfirmed}} )
{
# if the unconfirmed callsign exists as a confirmed logcall, delete this unconfirmed entry as it is now confirmed.
if ( exists $hash->{confirmed}->{$callsign} )
{
delete $hash->{unconfirmed}->{$callsign};
}
elsif ( $automate_unconfirmed )
{
# ...
}
}

print Dumper $hash;

return 1;
}

sub phase2
{

}

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE ON
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKEY IF
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
$VAR1 = { 
'confirmed' => {
'N6ZFO' => {
'logname' => 'BILL',
'logmult' => 'CA'
},
'W7WHY' => {
'logname' => 'TOM',
'logmult' => 'OR'
},
'W9RE' => {
'logname' => 'MIKE',
'logmult' => 'IN'
}
},
'unconfirmed' => {
'N6ZF' => {
'BILL' => {
'CA' => 1
}
},
'W9RR' => {
'MIKE' => {
'ON' => 1,
'IN' => 1
},
'MIKEY' => {
'IF' => 1
}
},
'N2NL' => {
'DAVE' => {
'FL' => 1
}
},
'W6NV' => {
'OLI' => {
'CA' => 3
}
}
}
};


Regards,

Chris


(This post was edited by Zhris on Feb 26, 2015, 1:46 PM)


Zhris
Enthusiast

Feb 27, 2015, 8:25 PM

Post #20 of 102 (14207 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I have worked on this a little this evening and although have done little testing, I have come up with the below program in order to automate generating the list of contestants. There is no automate on or off option because every unique contestant is inserted into the resultant log with a status, being one of 'valid', 'manual' or 'invalid'. The algorithm is a little complex to explain in the little time I have right now, but if you test out different scenarios, I'm sure you will discover. Briefly:

- valid:
----- the operator had logged entries therefore had submitted their log, OR, other operators had logged entries more than a threshold number of times, weighted 1 unit per unique operator.
----- we discovered a single name and mult more frequently by weight from potentially other possiblities.
- manual:
----- the entry is probably valid, BUT, we discovered multiple names and/or mults with equal frequencies by weight, therefore we couldn't decipher which one(s) were correct. This is most likely the result of contestants who didn't submit their logs, but other contestants who called them made errors.
- invalid:
----- the entry wasn't valid, AND, there may be multiple names and/or mults with equal frequencies by weight.

Once this log is generated, you can go through manually and make adjustements as you see fit. Contestants marked 'manual' are basically valid, but you should adjust the name and mult values to a single name and mult (all possiblities are listed seperated by a pipeline), then adjust the status to 'valid', or if you really want 'invalid'. I would assume that when you run through real world data, there will be none to little manuals. You can leave 'invalid' contestants alone, but you may wish to double check them following the same update process you did for 'manual'. Phase two would ignore contestants with any status other than 'valid'.


Code
use strict; 
use warnings;
use List::MoreUtils qw/before/;
use Data::Dumper;

#####

# handle open.

my $strings = [ ];

my $handle_input_entries = _handle( \*DATA );
my $handle_output_contestants = _handle( $strings, '>' );

#####

# init.

my $configuration =
{
handle_input_entries => $handle_input_entries,
handle_output_contestants => $handle_output_contestants,
case_sensitive => 0,
band_lookup => { 3 => '80M', 7 => '40M' },
threshold => 2,
};

my $phases =
[
\&_phase1,
\&_phase2,
];

#####

# phase.

my $phase = 0; # $ARGV[0];

$phases->[$phase]->( $configuration );

#####

# handle close.

close $handle_input_entries;
close $handle_output_contestants;

{
local $, = "\n";

print @$strings;
}

#####

# functions.

# phase one.
sub _phase1
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

local $" = '|';
local $, = "\t";
local $\ = "\n";

my $handle_input_entries = $configuration->{handle_input_entries};
my $handle_output_contestants = $configuration->{handle_output_contestants};
my $case_sensitive = $configuration->{case_sensitive};
my $threshold = $configuration->{threshold};

my $contestants = { };

while ( my $line = <$handle_input_entries> )
{
# ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove any whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $log_sign, $log_name, $log_mult, $call_sign, $call_name, $call_mult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# todo: validate line i.e. ensure each var has val.

# populate contestants hash.
#$contestants->{$log_sign }->{log }->{seen }->{$log_sign}++;
$contestants->{$log_sign }->{log }->{names}->{$log_name }->{$log_sign}++;
$contestants->{$log_sign }->{log }->{mults}->{$log_mult }->{$log_sign}++;
$contestants->{$call_sign}->{call}->{seen }->{$log_sign }++;
$contestants->{$call_sign}->{call}->{names}->{$call_name}->{$log_sign}++;
$contestants->{$call_sign}->{call}->{mults}->{$call_mult}->{$log_sign}++;
}

# print headings.
print $handle_output_contestants 'SIGN', 'NAME', 'MULT', 'STATUS';

for my $sign ( sort keys %$contestants )
{
my $contestant = $contestants->{$sign};

my $details_operator = keys %{$contestant->{log}} ? 'log' : 'call' ;

my $names = _details( $contestant->{$details_operator}->{'names'} );
my $mults = _details( $contestant->{$details_operator}->{'mults'} );

my $status = 'invalid';
if ( ( keys %{$contestant->{log}} and keys %{$contestant->{call}} ) or ( keys %{$contestant->{call}->{seen}} >= $threshold ) )
{
$status = ( @$names > 1 or @$mults > 1 ) ? 'manual' : 'valid' ;
}

# print line.
print $handle_output_contestants $sign, "@$names", "@$mults", $status;
}

return 1;
}

# phase two.
sub _phase2
{

}

# deals with multiple input / output handles in standalone programs.
sub _handle
{
my ( $expression, $mode, $divider ) = @_;

$expression // die 'expression required';
$mode //= '<';

my $handle = undef;
my $handles = [ ];

if ( ref $expression eq 'GLOB' )
{
$handle = $expression;
}
else
{
if ( ref $expression eq ref [ ] )
{
push @$expression, '';
$expression = \$expression->[-1];
}

open $handle, $mode, $expression or die "cannot open '$expression': $!";
}

if ( $mode eq '<' and defined $divider )
{
local $/ = $divider;

while ( my $block = <$handle> )
{
$block =~ s/\Q$divider\E$//;

open my $handle_b, $mode, \$block or die "cannot open '$block': $!";

push @$handles, $handle_b;
}
}
else
{
push @$handles, $handle;
}

return wantarray ? @$handles : $handles->[0] ;
}

# decipher most likely details by weight.
sub _details
{
my ( $details ) = @_;

$details // die 'details required';

my $hash = { };
$hash->{$_} += scalar( keys %{$details->{$_}} ) for ( keys %$details );

my $weight = undef;
my $list = [ before { $weight //= $hash->{$_}; $weight != $hash->{$_} } sort { $hash->{$b} <=> $hash->{$a} } keys %$hash ];

return $list;
}

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tommy OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3542 CW 2015-01-22 0231 W779 Tom OR N6ZF BILL CA
QSO: 3542 CW 2015-01-22 0231 W770 Tom OR N6ZF BIL CA
QSO: 3542 CW 2015-01-22 0231 W771 Tom OR N6ZF BIL CA
QSO: 3542 CW 2015-01-22 0231 W772 Tom OR N6ZF BI CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
#QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE ON
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKEY IF
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
#QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
#QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHN UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHN UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHNNY UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHNNY UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JILL UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N778 PETE UK W6NV OLI CA


Output:

Code
SIGN	NAME	MULT	STATUS 
N2NL DAVE FL invalid
N6ZF BILL|BIL CA manual
N6ZFO BILL CA valid
N777 JILL|JOHN|JOHNNY UK invalid
N778 PETE UK invalid
W6NV OLI CA valid
W770 TOM OR invalid
W771 TOM OR invalid
W772 TOM OR invalid
W779 TOM OR invalid
W7WHY TOMMY|TOM OR manual
W9RE MIKE IN valid
W9RR MIKE|MIKEY IF|ON|IN invalid


Regards,

Chris


(This post was edited by Zhris on Feb 27, 2015, 9:00 PM)


stuckinarut
User

Feb 28, 2015, 12:19 PM

Post #21 of 102 (14172 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Chris:

Sorry... I went down a 'Black Hole' here briefly with unwanted 'stuff' to deal with like filing an in-person report with the local Sheriff for a harassing phone calls issue ;-(

Will be back working on things later. I also need to finish transcribing my recorded notes.

Thanks!

-Stuckinarut


stuckinarut
User

Mar 1, 2015, 6:29 PM

Post #22 of 102 (14122 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Chris:

Sorry...delayed further here due to another crisis to deal with. I did extended the original date for publishing this year's event results.

In between the other stuff here, I'm chewing on an idea I think will make all this work pretty turnkey ('Automation').

Hopefully back in a couple of days - thanks for your patience.

-Stuckinarut


stuckinarut
User

Mar 28, 2015, 1:41 AM

Post #23 of 102 (13217 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

(Chris)

FINALLY I came up with the needed solution tweaks.

To not turn this thread into a further Novel from my end, I've posted put the info in a .pdf and uploaded it to:

http://www.xgenesis.com/hashorama/zchris.pdf

I hope you can still help.

Thanks!

-Stuckinarut


Zhris
Enthusiast

Mar 28, 2015, 12:12 PM

Post #24 of 102 (13179 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I'll hopefully be able to take another look at this tomorrow.

Best regards,

Chris


Zhris
Enthusiast

Mar 30, 2015, 3:47 AM

Post #25 of 102 (12871 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I have read through the PDF in detail and feel I have a good understanding of your vision. At this time I don't really have anything worth adding, between the scripts above, the solution has mostly been covered. When I get time this week, I will consolidate our ideas and code to produce a complete script which can be tweeked as necessary.

I'm still confident in a two phase system, since the complicated aspects are in building a list of valid contestant signs, names and locations, then scoring from then on should be accurate with no need for adjustments, the only adjustments would be to the contestant list. I like your CNQ concept, its similar to mine but is slightly lower level and probably easier to work with. I don't believe you covered it in your PDF, but log"info" (logcall) as oppose to call"info" (callwkd) needs to be weighted differently since it is the contestants own info, this needs a little more thought, I don't think you can rely 100% on the call"info" (callwkd) for building this list of contestants, imagine the scenario where a contestant submitted their log, but no one contacted them, or logged their contacts, or consistently made mistakes.

Regards,

Chris


(This post was edited by Zhris on Mar 30, 2015, 4:00 AM)

First page Previous page 1 2 3 4 5 Next page Last page  View All
 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives