CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
HASH-O-RAMA Data Processing Problem

 

First page Previous page 1 2 3 4 5 Next page Last page  View All


stuckinarut
User

Mar 30, 2015, 7:19 AM

Post #26 of 102 (15046 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hello Chris...


Quote
I'm still confident in a two phase system, since the complicated aspects are in building a list of valid contestant signs, names and locations, then scoring from then on should be accurate with no need for adjustments, the only adjustments would be to the contestant list. I like your CNQ concept, its similar to mine but is slightly lower level and probably easier to work with.


Yes, definitely 'lower level', but 'much easier to work with' on my end at this point. Down the line (and if volume of participants/submitted logs increases), then a Phase 2 would have great promise.


Quote
I don't believe you covered it in your PDF, but log"info" (logcall) as oppose to call"info" (callwkd) needs to be weighted differently since it is the contestants own info, this needs a little more thought, I don't think you can rely 100% on the call"info" (callwkd) for building this list of contestants, imagine the scenario where a contestant submitted their log, but no one contacted them, or logged their contacts, or consistently made mistakes.


Yes, I chewed and chewed on this aspect, and considered a 2-pass possibility:

1. Run through all the QSO: lines and grab the actual (LOGCALL-NAME-QTH) Combinations.

2. Then do the (CALLWKD-NAME-QTH) Combos.

3. MERGE both of these to come up with 'The Mother of All CNQ lists'. No need to have a separate weight... just a 'Weight' for each CNQ combination in the MERGED list. BTW, the actual log submitter's (LOGCALL-NAME-QTH) CNQ is pre-programmed into the logger before the event and used for the on-air exchange that is sent (and in the actual log).

FYI, in some Contests, log-checking software will ding (penalize) BOTH parties involved in a QSO (as INVALID) if only one side miscopies or mistypes the data. Of course, this requires that BOTH parties actually submit logs. I don't like that penalty method ;-(

If ALL participants submitted their logs, then the log checking process could be MUCH easier - really a Slam-Dunk and no need for a CNQ or 'Weight' factor or 'WTF'. Each QSO 2-way exchange data would be validated (or invalidated) by the actual log data from BOTH parties. Unfortunately that is not the case, and where the 'CNQ' and your brilliant 'Weight' factor idea comes into play.

The problem is getting everyone to submit their logs. Some folks just show up for part of the event to hand out some QSOs (which is appreciated). In 9 years, the only time a log was submitted with no contacts (actually just some 'partial' Header Info and funny 'Comment'), was this year from a longtime Ham friend just playing around with my new log submit form {SIGH}. Since there are no QSO lines to consolidated into a 'MASTER', he won't show up :^)

I'll advise the troops that I'll be a bit more delayed in getting out this year's Results and sit tight for whenever you can make the adjustments/tweaks. I'm now consolidating all this years log QSO lines, but still ran across some entrants using a different logger module that included QSO serial numbers I must remove from each line ;-(

I really appreciate your help, Chris !!!

- Stuckinarut


Zhris
Enthusiast

Apr 1, 2015, 3:43 PM

Post #27 of 102 (14757 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Just updating you that I have worked on this for a couple of hours this evening but haven't had time to go back through all the notes to ensure everything has been covered then test the code. I will post back at some point tomorrow.

Regards,

Chris


stuckinarut
User

Apr 1, 2015, 4:12 PM

Post #28 of 102 (14749 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Thanks, Chris. Will be standing by.

-Stuckinarut


Zhris
Enthusiast

Apr 3, 2015, 5:21 PM

Post #29 of 102 (14341 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Apologies for the delay. As per our discussion via PM, I realised upon testing I had taken the wrong direction. I haven't had much time to test this most recent code, therefore please test it vigourously and report issues in detail.

Please download the attached compressed file as it contains the script and corresponding test data sets. It has been configured such that you can run main.pl with your 2014 data.

>>>>> How to use:
- All configuration is controlled via the variables and / or the phase configuration hashes near the top.
- Adjust the base and the filepaths accordingly. My advise is to create directories where the script lives that contain all the data files, then set the base to 'directory/'. This will make it easier to manage different sets of data.
- Parts of the configuration can be overrided by supplying arguments to the script. This is particularly useful if you want to quickly test different values without having to modify the script itself.
- To run phase 0:
$perl main.pl --phase_n=0
- To run phase 1:
$perl main.pl --phase_n=1
- The script is interactive and asks you to confirm your intentions throughout. If you want to run the script non interactively i.e. ignore confirmations, then use the --yes argument:
$perl main.pl --phase_n=0 --yes=1
- If you want to adjust the base and / or the case sensitivity and / or the wtf threshold, then use the --base, --case_sensitivity, --wtf_threshold arguments respectively:
$perl main.pl --phase_n=0 --base=path/to/directory/ --case_sensitivity=1 --wtf_threshold=0
- Once you have run phase 0, you should go through the weights log, duplicate entries per sign aren't a problem i.e. in the case of IGOR vs JACK. Delete invalid entries or change their wtf to below the wtf threshold AND ensure valid entries have a wtf above the wtf threshold before running phase 1.

>>>>> Issues and notes:
- Different configurations per phase mainly for filepaths. You may wish to run phase0, then use a different named weights.txt for phase1.
- Phases namespaced to 0 and 1 respectively in order to remain consistent with their index in the phases array.
- Bonus stations inevitably shouldn't log calls to themselves therefore can only receive a maximum of 10000 bonus points.
- An undefined category defaults to '-1', since categories are only available for those who submitted logs. Alternatively, we could consider pushing category to the weights log, therefore giving you the opportunity to adjust after phase 0.
- The no return 'NORET' error is potentially inaccurate, since it is unfairly effected by mistakes and non submitters having no calls. You'll notice most of the errors reflect this. The no return error wasn't part of your recent notes but I have kept it just in case.
- The weights log LOGCALL heading changed to SIGN since the weights log contains a mix of log and call entries.
- Even after our discussion, I decided to weight log cnqs and call cnqs differently >:). The best way to understand how is to read the _input_weights function. There were too many potential issues I invisaged to ignore this, but can easily be changed if need be. Fundamentally every log cnq is given a wtf of 1, while every call cnq is given a wtf of 1 per unique log call ( in case of duplicates ). I believe however, we should also incooporate the category log into this, since this contains a list of submitted logs, therefore these are "guranteed" to be valid. After all though, its up to you to go through the weight log after phase0 and make adjustments before phase1.
- For now, if contestants used multiple names or qths, they will all be listed seperated by a pipeline in the scores log ( wtf dependent ).

>>>>> Todo:
- Full, vigorous testing of every possible scenario.
- Code and namespacing isn't perfect, there is plenty of room for further development.
- Perhaps a new error should be introduced in case anyone logs themselves and cheats the system.
- Debug option, handy output when monitoring script progress, useful during development.
- Optional, configurable headings across all logs.
- Alot of your work appears to be converting each contestants log into a universal format by hand. It would be straight forward to handle this conversion via Perl.


Code
use strict; 
use warnings FATAL => qw/all/;
use Getopt::Long;
use List::Util qw/sum/; # sum0
use Data::Dumper;

#####

local $/ = "\n";
local $" = '|';
local $, = "\t";
local $\ = "\n";

our $yes = 0;

my $phase_n = undef;
my $base = 'live20140116/'; # 'test1/'
my $case_sensitive = 0; # case sensitive should not vary between phases.
my $wtf_threshold = 2; # n of >= 2 is recommended.

GetOptions ( 'yes=i' => \$yes,
'phase_n=i' => \$phase_n,
'base=s' => \$base,
'case_sensitive=i' => \$case_sensitive, # case_sensitive!
'wtf_threshold=i' => \$wtf_threshold, ) or die "cannot get options";

die 'phase_n required or invalid' unless defined $phase_n and $phase_n =~ /^[01]$/;

my $phases =
[
{
handler => \&_phase0,
configuration =>
{
filepath_input_entries => "${base}entries.txt",
filepath_output_weights => "${base}weights.txt",
case_sensitive => $case_sensitive,
},
},
{
handler => \&_phase1,
configuration =>
{
filepath_input_bonuses => "${base}bonuses.txt",
filepath_input_categories => "${base}categories.txt",
filepath_input_weights => "${base}weights.txt",
filepath_input_entries => "${base}entries.txt",
filepath_output_errors => "${base}errors.txt",
filepath_output_scores => "${base}scores.txt",
case_sensitive => $case_sensitive,
bands => { 3 => '80M', 7 => '40M' },
wtf_threshold => $wtf_threshold,
points => 1000,
points_bonus => 5000,
default_wtf => -1, # ensure numeric / below wtf threshold, otherwise expect the unexpected.
default_category => -1, # ensure numeric.
},
},
];

print "begin phase $phase_n";

my $phase = $phases->[$phase_n];

_continue( Dumper( $phase->{configuration} ) . 'does the configuration look ok' );

$phase->{handler}->( $phase->{configuration} );

print "end phase $phase_n";

#####

#
sub _continue
{
my ( $message ) = @_;

return if $yes;

$message .= ', y to continue';

print $message;

chomp( my $response = <stdin> );

exit unless $response eq 'y';

return 1;
}

#
sub _phase0
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

# assign configuration to variables.
my $filepath_input_entries = $configuration->{filepath_input_entries};
my $filepath_output_weights = $configuration->{filepath_output_weights};
my $case_sensitive = $configuration->{case_sensitive};

#
_continue( "'$filepath_output_weights' not empty, do you really want to (re)run phase0" ) if ( stat $filepath_output_weights )[7];

open my $handle_input_entries, '<', $filepath_input_entries or die "cannot open '$filepath_input_entries': $!";
my $weights = _input_weights( $handle_input_entries, $case_sensitive );
close $handle_input_entries;

open my $handle_output_weights, '>', $filepath_output_weights or die "cannot open '$filepath_output_weights': $!";
print $handle_output_weights 'SIGN', 'NAME', 'QTH', 'WEIGHT';
_output_weights( $handle_output_weights, $weights );
close $handle_output_weights;

return 1;
}

#
sub _phase1
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

# assign configuration to variables.
my $filepath_input_bonuses = $configuration->{filepath_input_bonuses};
my $filepath_input_categories = $configuration->{filepath_input_categories};
my $filepath_input_weights = $configuration->{filepath_input_weights};
my $filepath_input_entries = $configuration->{filepath_input_entries};
my $filepath_output_errors = $configuration->{filepath_output_errors};
my $filepath_output_scores = $configuration->{filepath_output_scores};
my $case_sensitive = $configuration->{case_sensitive};
my $bands = $configuration->{bands};
my $wtf_threshold = $configuration->{wtf_threshold};
my $points = $configuration->{points};
my $points_bonus = $configuration->{points_bonus};
my $default_wtf = $configuration->{default_wtf};
my $default_category = $configuration->{default_category};

#
_continue( "'$filepath_input_weights' empty, do you really want to run phase1 now" ) if ! ( stat $filepath_input_weights )[7];
_continue( "'$filepath_output_errors' not empty, do you really want to (re)run phase1" ) if ( stat $filepath_output_errors )[7];
_continue( "'$filepath_output_scores' not empty, do you really want to (re)run phase1" ) if ( stat $filepath_output_scores )[7];

open my $handle_input_bonuses, '<', $filepath_input_bonuses or die "cannot open '$filepath_input_bonuses': $!";
my $bonuses = _input_bonuses( $handle_input_bonuses, $case_sensitive );
close $handle_input_bonuses;

open my $handle_input_categories, '<', $filepath_input_categories or die "cannot open '$filepath_input_categories': $!";
my $categories = _input_categories( $handle_input_categories, $case_sensitive );
close $handle_input_categories;

open my $handle_input_weights, '<', $filepath_input_weights or die "cannot open '$filepath_input_weights': $!";
<$handle_input_weights>; # discard headings.
my $weightsb = _input_weightsb( $handle_input_weights, $case_sensitive );
close $handle_input_weights;

open my $handle_input_entries, '<', $filepath_input_entries or die "cannot open '$filepath_input_entries': $!";
my $entries = _input_entries( $handle_input_entries, $categories, $weightsb, $case_sensitive, $bands, $wtf_threshold, $default_wtf, $default_category );
close $handle_input_entries;

open my $handle_output_errors, '>', $filepath_output_errors or die "cannot open '$filepath_output_errors': $!";
print $handle_output_errors 'LOGCALL', 'CALLWKD', 'BAND', 'TIME', 'NAME', 'QTH', 'ERROR', 'WTF';
_calculate_scores_and_output_errors( $handle_output_errors, $entries, $bonuses, $wtf_threshold, $points, $points_bonus );
close $handle_output_errors;

open my $handle_output_scores, '>', $filepath_output_scores or die "cannot open '$filepath_output_scores': $!";
print $handle_output_scores 'CAT', 'LOGCALL', 'SCORE', 'NAME', 'QTH';
_output_scores( $handle_output_scores, $entries );
close $handle_output_scores;

return 1;
}

#
sub _input_weights
{
my ( $handle_input_entries, $case_sensitive ) = @_;

my $weights = { };
my $weights_ = { };

while ( my $line = <$handle_input_entries> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $log_sign, $log_name, $log_qth, $call_sign, $call_name, $call_qth ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[5..10];

# construct log / call snq.
my $log_snq = join ${,}, $log_sign, $log_name, $log_qth;
my $call_snq = join ${,}, $call_sign, $call_name, $call_qth;

#
$weights->{log }->{$log_snq }->{$log_sign}++;
$weights->{call}->{$call_snq}->{$log_sign}++;
}

for my $log_snq ( keys %{$weights->{log}} )
{
my $log_wtf = sum( values %{$weights->{log}->{$log_snq}} );

$weights_->{$log_snq} = $log_wtf;
}

for my $call_snq ( keys %{$weights->{call}} )
{
my $call_wtf = scalar keys %{$weights->{call}->{$call_snq}};

$weights_->{$call_snq} += $call_wtf;
}

#print Dumper $weights, $weights_;

return $weights_;
}

#
sub _input_bonuses
{
my ( $handle_input_bonuses, $case_sensitive ) = @_;

my $bonuses = { };

while ( my $line = <$handle_input_bonuses> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

#
my ( $sign ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

warn 'duplicate' if defined $bonuses->{$sign};

#
$bonuses->{$sign} = 1;
}

#print Dumper $bonuses;

return $bonuses;
}

#
sub _input_categories
{
my ( $handle_input_categories, $case_sensitive ) = @_;

my $categories = { };

while ( my $line = <$handle_input_categories> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

#
my ( $sign, $category ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

warn 'duplicate' if defined $categories->{$sign};

#
$categories->{$sign} = $category;
}

#print Dumper $categories;

return $categories;
}

#
sub _input_weightsb
{
my ( $handle_input_weights, $case_sensitive ) = @_;

my $weightsb = { };

while ( my $line = <$handle_input_weights> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $sign, $name, $qth, $wtf ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

#
my $snq = join ${,}, $sign, $name, $qth;

#
$weightsb->{$snq} =
{
sign => $sign,
name => $name,
qth => $qth,
wtf => $wtf,
};
}

#print Dumper $weightsb;

return $weightsb;
}

#
sub _input_entries
{
my ( $handle_input_entries, $categories, $weightsb, $case_sensitive, $bands, $wtf_threshold, $default_wtf, $default_category ) = @_;

my $entries = { };

# process.
while ( my $line = <$handle_input_entries> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $frequency, $call_time, $log_sign, $log_name, $log_qth, $call_sign, $call_name, $call_qth ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

my $log_snq = join ${,}, $log_sign, $log_name, $log_qth;
my $log_wtf = $weightsb->{$log_snq}->{wtf} // $default_wtf;

next if $log_wtf < $wtf_threshold;

# lookup band via frequency.
my $band = $bands->{( $frequency =~ /([1-9])/ )[0]}; # todo: // 'other' / error.

my $log_category = $categories->{$log_sign} // $default_category;
my $log_calls = $entries->{$log_sign}->{bands}->{$band} //= [ ]; # use //= to allow autovivification / assign default value.

my $call_snq = join ${,}, $call_sign, $call_name, $call_qth;
my $call_wtf = $weightsb->{$call_snq}->{wtf} // $default_wtf;
my $call_duplicate = ( grep { $_->{sign} eq $call_sign } @$log_calls ) ? 1 : 0 ;
#my $call_return = undef; # cannot do yet, not until every call call has been pushed.

#
_construct_log_entry( $entries, $log_category, $log_sign, $log_name, $log_qth );

#
_construct_call_entry( $log_calls, $call_time, $call_sign, $call_name, $call_qth, $call_wtf, $call_duplicate );
}

# process remainder that have ok wtf. Technically namespace not log or call specific, but constructs log entry.
for my $log_snq ( keys %$weightsb )
{
my $log_wtf = $weightsb->{$log_snq}->{wtf} // $default_wtf;

next if $log_wtf < $wtf_threshold;

my $log_sign = $weightsb->{$log_snq}->{sign};
my $log_name = $weightsb->{$log_snq}->{name};
my $log_qth = $weightsb->{$log_snq}->{qth};
my $log_category = $categories->{$log_sign} // $default_category;

#
_construct_log_entry( $entries, $log_category, $log_sign, $log_name, $log_qth );
}

#print Dumper $entries;

return $entries;
}

#
sub _construct_log_entry
{
my ( $ref, $log_category, $log_sign, $log_name, $log_qth ) = @_;

#
$ref->{$log_sign}->{category} //= $log_category;
$ref->{$log_sign}->{names}->{$log_name} = 1;
$ref->{$log_sign}->{qths }->{$log_qth } = 1;
$ref->{$log_sign}->{bands} //= { };
$ref->{$log_sign}->{bonuses} //= { };
$ref->{$log_sign}->{score} //= 0;

return 1;
}

#
sub _construct_call_entry
{
my ( $ref, $call_time, $call_sign, $call_name, $call_qth, $call_wtf, $call_duplicate ) = @_;

#
push @$ref,
{
time => $call_time,
sign => $call_sign,
name => $call_name,
qth => $call_qth,
wtf => $call_wtf,
duplicate => $call_duplicate,
};

return 1;
}

#
sub _output_weights
{
my ( $handle_output_weights, $weights ) = @_;

for my $snq ( sort keys %$weights )
{
my $wtf = $weights->{$snq};

# print.
print $handle_output_weights $snq, $wtf; # important that snq is $, divided.
}

return 1;
}

#
sub _calculate_scores_and_output_errors
{
my ( $handle_output_errors, $entries, $bonuses, $wtf_threshold, $points, $points_bonus ) = @_;

for my $log_sign ( sort keys %$entries )
{
my $log = $entries->{$log_sign};

my $log_bands = $log->{bands};
my $log_bonuses = $log->{bonuses};

for my $band ( sort keys %$log_bands )
{
my $log_calls = $log_bands->{$band}; # // [ ];

for my $call ( sort { $a->{sign} cmp $b->{sign} || $a->{time} <=> $b->{time} } @$log_calls )
{
my $call_time = $call->{time};
my $call_sign = $call->{sign};
my $call_name = $call->{name};
my $call_qth = $call->{qth};
my $call_wtf = $call->{wtf};
my $call_duplicate = $call->{duplicate};
my $call_calls = ( exists $entries->{$call_sign} ) ? $entries->{$call_sign}->{bands}->{$band} : [ ]; # use condition to prevent autovivification.
my $call_return = ( grep { $_->{sign} eq $log_sign } @$call_calls ) ? 1 : 0 ;

# validate call.
my ( $call_error, $call_wtf_string ) = ( $call_duplicate ) ? ( 'DUPE' , $call_wtf ) :
( $call_wtf < $wtf_threshold ) ? ( 'CNQ' , "$call_wtf<$wtf_threshold" ) :
( not $call_return ) ? ( 'NORET', $call_wtf ) :
( undef , undef ) ;

# log errors or update score.
if ( defined $call_error )
{
# print.
print $handle_output_errors $log_sign, $call_sign, $band, $call_time, $call_name, $call_qth, $call_error, $call_wtf_string;
}
# todo: better if scoring handled in own function or by _output_scores.
elsif ( exists $bonuses->{$call_sign} and not exists $log_bonuses->{$call_sign} )
{
$log->{score} += $points + $points_bonus;

$log_bonuses->{$call_sign} = 1;
}
else
{
$log->{score} += $points;
}
}
}
}

return 1;
}

#
sub _output_scores
{
my ( $handle_output_scores, $entries ) = @_;

for my $log_sign ( sort { $entries->{$a}->{category} <=> $entries->{$b}->{category} || $entries->{$b}->{score} <=> $entries->{$a}->{score} } keys %$entries )
{
my $log = $entries->{$log_sign};

my $log_category = $log->{category};
my $log_names = [ keys %{$log->{names}} ];
my $log_qths = [ keys %{$log->{qths }} ];
my $log_score = $log->{score};

# print.
print $handle_output_scores $log_category, $log_sign, $log_score, "@$log_names", "@$log_qths";
}

return 1;
}


Regards,

Chris


(This post was edited by Zhris on Apr 3, 2015, 5:42 PM)
Attachments: contestcrosschecker.zip (18.9 KB)


Zhris
Enthusiast

Apr 3, 2015, 7:05 PM

Post #30 of 102 (14323 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Just been reading back through the code, there are a couple of issues, but I'm not going to worry about those right now, I don't think your 2014 test data encounters them. I'm thinking I would also like to simplify the code across _input_weightsb, _input_entries and _calculate_scores_and_output_errors, I should have built an intermediate contestants structure to work off of more easily.

Chris


stuckinarut
User

Apr 3, 2015, 7:32 PM

Post #31 of 102 (14318 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi, Chris:

Thank you so much for your continued efforts and assistance. I will do some rigorous testing over the weekend & report back.

If any questions during the process, I'll post them here.

- Stuckinarut


stuckinarut
User

Apr 4, 2015, 7:05 AM

Post #32 of 102 (14194 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi, Chris:

Up briefly on little sleep but wanted to make a first go of things :^)

In my groggy state I was a bit confuzzzzed until I re-read the --phase_n= info again. Especially, since I have not had any past experience with an 'interactive' Perl script. Pretty cool indeed when I was finally able to take the first 'Test Drive' !!!

1. When I looked at the Error log, it was filled with 'NORET' entries which made my eyeballs roll. I am only interested in CNQ and DUPE errors. It would be very helpful if you could please add an entry in the 'config' area something like:

noret = {on/off] or [yes/no]

2. What did immediately pop into mind was how cool and efficient it would be to have a final column in the Error Report of 'ADJ' (for Adjustment) which would have a Unique Number for each particular Error log entry. Then, to re-run the script with a different 'phase' which would give the option to plug-in any 'ADJ' entries that upon Manual inspection/cross-checking the logs, that *should* be considered valid and included in a final update to the Scores list. Something like: Enter ADJ Number: and then the option to enter more until finished instead of having to re-run the script for individual adjustments.

3. In the Scores list, I am confuzzzzzed about the (negative -) numbers for some of the Callsigns. Especially, this one for W7WHY who was a log submitter:

-1 W7WHY 0 TOM OR

Yes, I'm confuzzzzzzed here (probably because I need more sleep!!!)

4. For the Weight listing, can you please explain how I can change the Weight for each entry to be equal for further comparison and examination? I think this is going to be important for purposes of 'Education and Illustration' to the log submitters in terms of an actual percentage of the problems with miscopied and/or mistyped data based upon a standardized weight factor. Yes, that would be very helpful.

5. Regading the piped display of (one example) VE4 | MB for two of the Canadian entrants, this highlights an issue I must clearly communicate to folks in the next year's 'Rules' ... because technically both VE4 & MB are correct (VE4 is the call 'prefix' for the 'mult' (QTH) of MB - Manitoba). However, if in fact VE4 was sent but MB entered into in the recipient log, that *should* be an error. But some logging software *may* auto-convert a VE4 entry to the normally recognized MB used in scoring. Not sure if I'm conveying this properly.

Eyelids are drooping... need a few more hours of sleep before diving back in.

Thanks!

-Stuckinarut


stuckinarut
User

Apr 4, 2015, 7:14 AM

Post #33 of 102 (14188 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Ohhh... two more things before I head off for ZZZZZZZZZ'sVille.

I can do this manually in Excel, but if not a lot of extra work, once again for 'Educational' type purpose to the troops, to add 1more output .txt list file and a minor mod to the weights.txt file:

1. nologs.txt ... this would be a NET-NET list of 'Unique' callsigns from ONLY the (CALLSWKD) column that did NOT actually submit a log. Having a final "Total QTY" line at the bottom of the list would eliminate importing, tallying & exporting with Excel.

2. weights.txt ... just to tally the total # of CNQ's involved, which would again same manual work involving Excel.

Thanks!

-Stuckinarut


Zhris
Enthusiast

Apr 4, 2015, 7:33 AM

Post #34 of 102 (14183 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

I will be re working the script later and will take into account the issues you raise here. I just wanted to respond to each of your points.

1) The noret isn't very accurate at this time as explained above. I will be improving this accuracy by ensuring it does not error on non submitters, although this in turn has disadvantages, once I have figured it out I will provide the details.

2) This could be a good idea.

3) A negative 1 indicates there was no category as explained above. I chose to use a number for ease later when outputting the scores and doing a numerical sort on categories. In the case of W7WHY, it isn't in the categories log.

4) Not sure I fully understand, but I will think about it and get back to you.

5) I'll think about how this could be accounted for. Its another complexity that will take some thought to implement ;-). But how do we know ve4 was sent and not mb.

Regarding your second response, both 1) and 2) can be incorporated.

Sleep tight. Regards,

Chris


(This post was edited by Zhris on Apr 4, 2015, 7:39 AM)


stuckinarut
User

Apr 4, 2015, 10:48 AM

Post #35 of 102 (14097 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Chris -

Back up but still running a sleep deficit here ;-(

Hmmm... I checked this original 'Category' list in the .zip upload and W7WHY was included:

LQP58_CALLSIGN-CATEGORY-LISTSORTED.txt

It appears a Gremlin is lurking about in our midst trying to cause problems ???

As I was waking up, I realized how brilliant your structuring to the sub-directory system was and to use 'understandable' .txt file names vs. listQ.txt etc. This also eliminates having to keep typing the multiple list_.txt names each time I re-run the script. For future years, all I need to do is create a different sub-directory. BRILLIANT 'Forward Thinking', Chris !!!

Will do more 'Test Drives' in-between working on taxes ;-(

- Stuckinarut


Zhris
Enthusiast

Apr 4, 2015, 11:35 AM

Post #36 of 102 (14083 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

I might be going crazy here, but I can't find w7why in live20140116/categories.txt. Am I using outdated data, where did LQP58_CALLSIGN-CATEGORY-LISTSORTED.txt come from?

Chris


stuckinarut
User

Apr 4, 2015, 11:41 AM

Post #37 of 102 (14082 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Chris:

I took another Quickie TD look at the Error log by importing it into Excel, doing a sort on the Error column, nuking the NORET entries, and then re-sorting the data based on the desired outcome in my original .pdf file.

As I took a look-see, I was just about the also nuke the CNQ entries that were NOT *below* the WTF, when I saw this one with a high CNQ value:

K6NV VE3KI 80M 0242 RICH ON CNQ 86

So I went to the main Entries list to do a find to see how many QSOs were reported with VE3KI by all log submitters, but only found this single one:

QSO: 3543 CW 2014-01-16 0242 K6NV BOB CA VE3KI RICH ON

In the Weights list, VE3KI only shows with a value of 1. I'm very curious how a CNQ of 86 was assigned to this 'transaction' (QSO) in the Error log?

As I mentioned before, I think just using (or being able to specify) a single Weight for all QSO lines ('transactions') is easiest. For the Error log Manual analysis & decision making to be done, keeping it to just CNQ below<WTF and DUPE errors will greatly simplify testing. I can use the separate Weight list side-by-side during for the Manual analysis.

Unless I messed up, here is the 'prune' Error list down to only CNQ's with <2 (WTF) and DUPE entries:


Code
LOGCALL	CALLWKD	BAND	TIME	NAME	QTH	ERROR	WTF 
K6DGW W9RE 40M 0226 JOHN IN CNQ 1<2
K6NV W7OM 40M 0218 ROD WA DUPE 35
K6NV K6VVA 80M 0257 RICK CA CNQ 1<2
K9YC VE3DZ 40M 0225 YURI ON DUPE 32
K9YC N3QE 80M 0255 TIM MD DUPE 39
N0AC N4JRG 80M 0253 MIKE KY DUPE 87
N0TA N5DO 80M 0256 DAVE TX DUPE 32
N3QE K9YC 80M 0255 JACK CA DUPE 15
N4AF VE3KQN 80M 0248 JIM VE3 CNQ 1<2
N4JRG N4AFY 80M 0243 JACK NC CNQ 1<2
VE4YU K0AD 80M 0231 LOCUST MD CNQ 1<2
W4AU VE3BZ 80M 0231 YURI ON CNQ 1<2
W4OC W6NV 40M 0216 JACK CA DUPE 37


My suspicions are there may be more actual Errors based on the WTF level, but will have to investigate a bit :^)

Ohhh, regarding my previous idea about an 'ADJ' column, when the QSO lines are initially imported, they *could* be auto-assigned something like a QID (QSO ID Number) for later use in making 'interactive' adjustments. Maybe that would need to be a separate script?

Just some more feedback before finally attacking the taxes nightmare.

- Stuckinarut


stuckinarut
User

Apr 4, 2015, 11:45 AM

Post #38 of 102 (14079 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post


In Reply To
I might be going crazy here, but I can't find w7why in live20140116/categories.txt. Am I using outdated data, where did LQP58_CALLSIGN-CATEGORY-LISTSORTED.txt come from?

Chris


OHHHH... sorry, my bad (I'll chalk it up to the sleep deficit). I was apparently mixing apples & oranges (2014 & 2015 data). Sincere apologies.

- Stuckinarut


stuckinarut
User

Apr 4, 2015, 12:22 PM

Post #39 of 102 (14069 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Chris -

I decided to quickly do another run using wtf = 3 and the Error log entries increased dramatically (of course :^)

As I quickly looked at one entry (the W7OM QSO with WQ5L) that showed a CNQ Error WTF of 1<3 I thought a bit strange. Checking the Weights list showed WQ5L = 97. Hmmm. Even if using a single Weight factor throughout which would have likely been in the 50 something range, not sure why this showed as an error.

NOTE: There were actually 2 QSOs by W7OM with WQ5L but on different bands - same CNQ 1<3 Error.

Here's the WTF <3 run Error log for CNQ & DUPE entries:


Code
LOGCALL	CALLWKD	BAND	TIME	NAME	QTH	ERROR	WTF 
K0AD W6NV 80M 0242 ??? CA CNQ 1<3
K0EU N5ZO 40M 0220 MARCO CA CNQ 1<3
K0EU K6WG 40M 0221 STAN CA CNQ 2<3
K0TG N5ZO 40M 0205 MARK CA CNQ 1<3
K0TG K6BGW 40M 0217 SKIP CA CNQ 1<3
K1GU K6VVA 80M 0236 HANK CA CNQ 1<3
K1GU N8XX 80M 0238 HANK MI CNQ 1<3
K6DGW W1NN 40M 0213 HAL SC CNQ 1<3
K6DGW W9RE 40M 0226 JOHN IN CNQ 1<3
K6NV K4BAI 40M 0204 JACK GA CNQ 1<3
K6NV K5OT 40M 0215 LARRY TX CNQ 1<3
K6NV W7OM 40M 0218 ROD WA DUPE 81
K6NV W9RE 40M 0223 JACK IN CNQ 1<3
K6NV K0AC 40M 0224 BILL IA CNQ 1<3
K6NV VE3KI 80M 0242 RICH ON CNQ 1<3
K6NV K6VVA 80M 0257 RICK CA CNQ 1<3
K6SRZ N3SD 80M 0241 JOE PA CNQ 1<3
K7SS NK9G 40M 1747 RICK WI CNQ 2<3
K7SS K6WG 40M 1748 STAN CA CNQ 2<3
K9YC KM7Q 40M 0212 BOB OR CNQ 1<3
K9YC VE3DZ 40M 0225 YURI ON DUPE 83
K9YC N3QE 80M 0255 TIM MD DUPE 102
N0AC K2QBN 40M 0212 VAN FL CNQ 1<3
N0AC N4LOV 40M 0225 AL AL CNQ 1<3
N0AC N3ID 80M 0246 GREG PA CNQ 1<3
N0AC N4JRG 80M 0253 MIKE KY DUPE 52
N0TA N6DA 40M 0221 JIM CA CNQ 1<3
N0TA XE3S 40M 0227 MARKO XE CNQ 1<3
N0TA N5DO 80M 0256 DAVE TX DUPE 93
N3BB N8XX 40M 0201 IGOR MI CNQ 2<3
N3BB W4UX 40M 0202 JIM NC CNQ 1<3
N3BB N0AT 80M 0246 JOHN CA CNQ 1<3
N3QE K6DGW 40M 0229 SCIP CA CNQ 1<3
N3QE K9YC 80M 0255 JACK CA DUPE 102
N3SD N5ZO 40M 0216 KA CA CNQ 1<3
N3SD N3QE 80M 0258 TIM MN CNQ 1<3
N4AF VE3KQN 80M 0248 JIM VE3 CNQ 1<3
N4AF KG4USN 80M 0250 KEN GA CNQ 1<3
N4JRG XE2S 40M 0228 MARYO DX CNQ 2<3
N4JRG N4AFY 80M 0243 JACK NC CNQ 1<3
N4JRG N9AC 80M 0253 BILL IA CNQ 1<3
N4JRG N3SB 80M 0255 GILL PA CNQ 1<3
N5DO W1EBI 80M 0234 GEO MA CNQ 1<3
N5DO K0TA 80M 0248 JOHN CO CNQ 1<3
N5DO K0EU 80M 0259 JOHN CO CNQ 1<3
N6DA VE4EA 40M 0212 GARY MB CNQ 1<3
N6DA W4NG 40M 0218 TED TN CNQ 1<3
N6DA W9RE 40M 0226 MIKE IL CNQ 1<3
N6IP N4LOV 40M 0227 CARL AL CNQ 1<3
N6RO WH6LE 40M 0205 PETE HI CNQ 1<3
N6ZFO N5RO 40M 0216 JACK CA CNQ 1<3
N6ZFO N6DA 80M 0243 27 DON CNQ 1<3
N8XX K1GU 80M 0237 NEB TN CNQ 1<3
VE4EA N5IP 80M 0242 JACK CA CNQ 1<3
VE4EA W4VA 80M 0258 JOHN VA CNQ 1<3
VE4YU K0AD 80M 0231 LOCUST MD CNQ 1<3
W0BH NK9G 40M 0217 RICK WI CNQ 2<3
W1EBI W1NN 80M 0235 STAN OH CNQ 1<3
W4AU VE3BZ 80M 0231 YURI ON CNQ 1<3
W4AU W1NN 80M 0235 HAL MA CNQ 1<3
W4NJK W6NV 80M 0238 OLIVER CA CNQ 1<3
W4NJK W7WHY 80M 0253 JIM WA CNQ 1<3
W4OC W6NV 40M 0216 JACK CA DUPE 26
W7OM VE4EA 40M 0206 ED MB CNQ 1<3
W7OM K5OT 40M 0211 JIM TX CNQ 1<3
W7OM K6TV 40M 0213 BOB CA CNQ 1<3
W7OM W5QL 40M 0215 RAY MS CNQ 1<3
W7OM XE2S 80M 0241 MARYO DX CNQ 2<3
W7OM W5QL 80M 0253 RAY MS CNQ 1<3
W7OM VE4EA 80M 0257 ED MB CNQ 1<3
WA6URY N0AC 40M 0205 BILL CA CNQ 1<3
WA6URY N4AF 40M 0223 JACK TN CNQ 1<3
WA6URY XE2S 80M 0247 MARC DX CNQ 1<3
WQ5L N5AW 80M 0247 MARV CO CNQ 1<3
XE2S W7OM 80M 0241 RON WA CNQ 1<3


Hope this feedback helps.

- Stuckinarut


(This post was edited by stuckinarut on Apr 4, 2015, 12:24 PM)


stuckinarut
User

Apr 4, 2015, 12:47 PM

Post #40 of 102 (14061 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

I gotta get to the taxes, but can't put this thing down :^)

FYI, I checked (PUN!) the logging software I use, and son-of-a-gun ... entries for the Canadian 'Manitoba' QTH/Mult of either MB or VE4 both get accepted, and the auto-calculated score update that displays after each QSO accurately reflects either entry as valid.

Hmmm.

I thought this was pretty cool what you did in the Scores output:


Code
5	VE3DZ	49000	YURI	VE3|ON 
7 XE2S 23000 MARCO DX|XE
8 VE4EA 46000 CARY VE4|MB
8 VE4YU 22000 ED VE4|MB


These variances will only apply to NON-USA entries. However, as I mentioned previously, IF 'VE4' was sent but 'MB' entered into the log, well, hmmm... perhaps that could count as a 'Mulligan' - haven't decided yet.

In the logging software, there is a .txt type file for all Mults/QTH listings with any alternates. I'm wondering if something similar should be used to insure validation? In the case of *some* 'DX' entries that also have another Mult/QTH designator, I might have to manually add those to a list depending on what ends up showing in the Master CNQ 'Weights' list.

Just a thought. I'll be offline now for some hours.

- Stuckinarut


Zhris
Enthusiast

Apr 4, 2015, 2:13 PM

Post #41 of 102 (14040 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,


Quote
As I quickly looked at one entry (the W7OM QSO with WQ5L) that showed a CNQ Error WTF of 1<3 I thought a bit strange


Looking at the errors, W7OM actually called W5QL ( wtf = 1 ) not WQ5L ( wtf = 97 ), looks like a genuine CNQ error to me.


Quote
NOTE: There were actually 2 QSOs by W7OM with WQ5L but on different bands - same CNQ 1<3 Error.


With the current weighting system, it will only log a wtf of 1 per unique log sign to ensure duplicates don't skew the wtf, and doesn't account for band as that may seem over the top ( CNQB ;- ) ). W7OM made this same mistake twice, but only weighted it 1. The relevant code snippets:


Code
$weights->{call}->{$call_snq}->{$log_sign}++; # $weights->{call}->{'W5QL RAY MS'}->{'W7OM'}++ 

...

for my $call_snq ( keys %{$weights->{call}} )
{
my $call_wtf = scalar keys %{$weights->{call}->{$call_snq}}; # my $call_wtf = 1

$weights_->{$call_snq} += $call_wtf; # $weights_->{'W5QL RAY MS'} += 1 ( += is misleading, just = is fine )
}


I've started to make a few adjustments, keep reporting potential issues as and when you have time.

Regards,

Chris


(This post was edited by Zhris on Apr 4, 2015, 2:22 PM)


stuckinarut
User

Apr 4, 2015, 4:56 PM

Post #42 of 102 (14001 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post


Quote
Looking at the errors, W7OM actually called W5QL ( wtf = 1 ) not WQ5L ( wtf = 97 ), looks like a genuine CNQ error to me.


OUCH-OUCH-OUCH... my bad again, sorry... must have been temporary Dyslexia here caused by rushing too fast ;-(

I may be a bit more scare for the next few days with the tax stuff {SIGH}, but will find a way to play some Hookey to test.

Thanks, Chris.

- Stuckinarut


stuckinarut
User

Apr 5, 2015, 10:45 AM

Post #43 of 102 (13776 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

For thread readers wondering if anything is happening, there is via a number of Personal Messages (to keep the thread length down). I decided to go ahead and post this new suggestion for Chris to the Forum to keep things a bit alive here.
=======

Chris:

Just as my head hit the pillow earlier, a "Flash-of-Inspiration" struck :^)

Regarding my previous suggestion about adding an 'ADJ' column to the Error log that would contain the 'QID' (QSO ID Number) for each Error log QSO entry (a/k/a 'Transaction').

Adding one more .txt file to the mix called 'adjusts.txt' would be to simply copy & paste (or type) the QID for whatever QSOs from the Error log are to be adjusted/ validated/given credit after scrutiny (if any). This would be similar to the 'bonuses.txt' file list.

THEN, when re-running the main script, as each QSO/Transaction is checked, *BEFORE* what would normally dump a QSO to the Error log, a piece of code would check the (new) 'adjusts.txt' file before proceeding. IF there is a match of the QID involved in the (new) 'adjusts.txt' file ... 'BINGO` ... the QSO credit is given (and if a 'Bonus Points' station that credit as well).

Example (adjusts.txt file):


Code
387 
14
1599
260
3


The new 'scores.txt' file/report would then be the FINAL (Adjusted) SCORES for integrating into the event RESULTS as desired.

Yeah, this would be the proverbial 'Cat's Meow' :^)

WOW... as I was just finishing typing above, another 'FLASH'... of how everything could be done 'Interactively' *during* the running of a single script, but let's keep things in 'K.I.S.S.' mode for now.

Thanks!

Eric


Zhris
Enthusiast

Apr 9, 2015, 8:38 PM

Post #44 of 102 (12839 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hi,

Haven't spoken to you for a couple of days, just letting you know that I am about to test the latest version and will get back to you tomorrow.

Regards,

Chris


stuckinarut
User

Apr 9, 2015, 8:45 PM

Post #45 of 102 (12836 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

No problem, Chris...still working on taxes ;-(

Looking forward to testing the new version !!!

Thanks very much,

-Stuckinarut


Zhris
Enthusiast

Apr 10, 2015, 3:58 AM

Post #46 of 102 (12805 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Check your PMs when you have time, I am very nearly ready but have an issue regarding the adjustments log. I'm posting this here in case you have "Send private message notification via e-mail" off.

Regards,

Chris


stuckinarut
User

Apr 10, 2015, 4:15 AM

Post #47 of 102 (12802 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Christ...

Received your PM OK and just replied with a PM :^)

Thanks!

- Stuckinarut


Zhris
Enthusiast

Apr 10, 2015, 6:08 AM

Post #48 of 102 (12796 views)
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Thank you.

I'm still not 100% but have enough to go on, I will finish implementing adjustments later, everything else is ready including my notes to you.

Chris


stuckinarut
User

Apr 10, 2015, 9:42 AM

Post #49 of 102 (12732 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

Hey, Chris...

Got a couple more hours sleep. Just as my head hit the pillow, this CNQ ANALYSIS came to mind.

It will help you understand better how I am going to use it in conjunction with the ERROR LOG.

www.xgenesis.com/hashorama/2014_LQP_CNQ_ANALYSIS.pdf

So out of 115 CNQ Combinations, the GOOD and BAD are almost evenly split. Since I am familiar with most of the actual GOOD vs. BAD ones by memory, any of the MAYBE or ??? entries that show up I can fire off an email to those guys to verify EXACTLY what NAME & QTH they use (or to determine if these might have been "One-Off" log paddings {GRIN}.

For the ??? entries the same thing, but you will note that these had DOUBLE bad combinations for the identical Callsign and only 1 single entry each.

In the event any BAND 'DUPE' might also be a CNQ (or Vice-Versa), either case will result in an ERROR that will NOT be validated.

So I have a 'system' formulated here :^)

DISCLAIMER: I whipped this analysis together VERY rapidly, so there could be one or 2 "ERRORS" {SIGH}, but 'Close enough for Government work' in terms of an illustration.

Hope this helps!!!

- Stuckinarut

P.S. Once again, many of the "BAD" problems are the result of guys using "PRE-FILLS" in the logging software as I previously explained I think in a PM.


(This post was edited by stuckinarut on Apr 10, 2015, 9:47 AM)


stuckinarut
User

Apr 10, 2015, 9:54 AM

Post #50 of 102 (12730 views)
Re: [Zhris] HASH-O-RAMA Data Processing Problem [In reply to] Can't Post

(MORE)...

Regarding all those split VE3|ON, VE4|MB and XE|DX problems, I can eliminate most of those in future years by CLARIFYING BY "EXAMPLES" WITHIN THE RULES of what these guys must do in their logging software in order to *NOT* end of with DQ'd ("Disqualified") QSOs !!!

The same for whenever a DIFFERENT NAME is used than normal (like another "Honor/Tribute" situation), etc.

This has already been EXTEREMLY VALUABLE in seeing the "BIG PICTURE" of some needed actions to be taken !!!

Thanks again for helping bring these things to light in an 'automated' way !!!

- Stuckinarut


(This post was edited by stuckinarut on Apr 10, 2015, 9:55 AM)

First page Previous page 1 2 3 4 5 Next page Last page  View All
 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives