CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate: Re: [stuckinarut] HASH-O-RAMA Data Processing Problem: Edit Log



Zhris
Enthusiast

Apr 3, 2015, 5:21 PM


Views: 31426
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Apologies for the delay. As per our discussion via PM, I realised upon testing I had taken the wrong direction. I haven't had much time to test this most recent code, therefore please test it vigourously and report issues in detail.

Please download the attached compressed file as it contains the script and corresponding test data sets. It has been configured such that you can run main.pl with your 2014 data.

>>>>> How to use:
- All configuration is controlled via the variables and / or the phase configuration hashes near the top.
- Adjust the base and the filepaths accordingly. My advise is to create directories where the script lives that contain all the data files, then set the base to 'directory/'. This will make it easier to manage different sets of data.
- Parts of the configuration can be overrided by supplying arguments to the script. This is particularly useful if you want to quickly test different values without having to modify the script itself.
- To run phase 0:
$perl main.pl --phase_n=0
- To run phase 1:
$perl main.pl --phase_n=1
- The script is interactive and asks you to confirm your intentions throughout. If you want to run the script non interactively i.e. ignore confirmations, then use the --yes argument:
$perl main.pl --phase_n=0 --yes=1
- If you want to adjust the base and / or the case sensitivity and / or the wtf threshold, then use the --base, --case_sensitivity, --wtf_threshold arguments respectively:
$perl main.pl --phase_n=0 --base=path/to/directory/ --case_sensitivity=1 --wtf_threshold=0
- Once you have run phase 0, you should go through the weights log, duplicate entries per sign aren't a problem i.e. in the case of IGOR vs JACK. Delete invalid entries or change their wtf to below the wtf threshold AND ensure valid entries have a wtf above the wtf threshold before running phase 1.

>>>>> Issues and notes:
- Different configurations per phase mainly for filepaths. You may wish to run phase0, then use a different named weights.txt for phase1.
- Phases namespaced to 0 and 1 respectively in order to remain consistent with their index in the phases array.
- Bonus stations inevitably shouldn't log calls to themselves therefore can only receive a maximum of 10000 bonus points.
- An undefined category defaults to '-1', since categories are only available for those who submitted logs. Alternatively, we could consider pushing category to the weights log, therefore giving you the opportunity to adjust after phase 0.
- The no return 'NORET' error is potentially inaccurate, since it is unfairly effected by mistakes and non submitters having no calls. You'll notice most of the errors reflect this. The no return error wasn't part of your recent notes but I have kept it just in case.
- The weights log LOGCALL heading changed to SIGN since the weights log contains a mix of log and call entries.
- Even after our discussion, I decided to weight log cnqs and call cnqs differently >:). The best way to understand how is to read the _input_weights function. There were too many potential issues I invisaged to ignore this, but can easily be changed if need be. Fundamentally every log cnq is given a wtf of 1, while every call cnq is given a wtf of 1 per unique log call ( in case of duplicates ). I believe however, we should also incooporate the category log into this, since this contains a list of submitted logs, therefore these are "guranteed" to be valid. After all though, its up to you to go through the weight log after phase0 and make adjustments before phase1.
- For now, if contestants used multiple names or qths, they will all be listed seperated by a pipeline in the scores log ( wtf dependent ).

>>>>> Todo:
- Full, vigorous testing of every possible scenario.
- Code and namespacing isn't perfect, there is plenty of room for further development.
- Perhaps a new error should be introduced in case anyone logs themselves and cheats the system.
- Debug option, handy output when monitoring script progress, useful during development.
- Optional, configurable headings across all logs.
- Alot of your work appears to be converting each contestants log into a universal format by hand. It would be straight forward to handle this conversion via Perl.


Code
use strict; 
use warnings FATAL => qw/all/;
use Getopt::Long;
use List::Util qw/sum/; # sum0
use Data::Dumper;

#####

local $/ = "\n";
local $" = '|';
local $, = "\t";
local $\ = "\n";

our $yes = 0;

my $phase_n = undef;
my $base = 'live20140116/'; # 'test1/'
my $case_sensitive = 0; # case sensitive should not vary between phases.
my $wtf_threshold = 2; # n of >= 2 is recommended.

GetOptions ( 'yes=i' => \$yes,
'phase_n=i' => \$phase_n,
'base=s' => \$base,
'case_sensitive=i' => \$case_sensitive, # case_sensitive!
'wtf_threshold=i' => \$wtf_threshold, ) or die "cannot get options";

die 'phase_n required or invalid' unless defined $phase_n and $phase_n =~ /^[01]$/;

my $phases =
[
{
handler => \&_phase0,
configuration =>
{
filepath_input_entries => "${base}entries.txt",
filepath_output_weights => "${base}weights.txt",
case_sensitive => $case_sensitive,
},
},
{
handler => \&_phase1,
configuration =>
{
filepath_input_bonuses => "${base}bonuses.txt",
filepath_input_categories => "${base}categories.txt",
filepath_input_weights => "${base}weights.txt",
filepath_input_entries => "${base}entries.txt",
filepath_output_errors => "${base}errors.txt",
filepath_output_scores => "${base}scores.txt",
case_sensitive => $case_sensitive,
bands => { 3 => '80M', 7 => '40M' },
wtf_threshold => $wtf_threshold,
points => 1000,
points_bonus => 5000,
default_wtf => -1, # ensure numeric / below wtf threshold, otherwise expect the unexpected.
default_category => -1, # ensure numeric.
},
},
];

print "begin phase $phase_n";

my $phase = $phases->[$phase_n];

_continue( Dumper( $phase->{configuration} ) . 'does the configuration look ok' );

$phase->{handler}->( $phase->{configuration} );

print "end phase $phase_n";

#####

#
sub _continue
{
my ( $message ) = @_;

return if $yes;

$message .= ', y to continue';

print $message;

chomp( my $response = <stdin> );

exit unless $response eq 'y';

return 1;
}

#
sub _phase0
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

# assign configuration to variables.
my $filepath_input_entries = $configuration->{filepath_input_entries};
my $filepath_output_weights = $configuration->{filepath_output_weights};
my $case_sensitive = $configuration->{case_sensitive};

#
_continue( "'$filepath_output_weights' not empty, do you really want to (re)run phase0" ) if ( stat $filepath_output_weights )[7];

open my $handle_input_entries, '<', $filepath_input_entries or die "cannot open '$filepath_input_entries': $!";
my $weights = _input_weights( $handle_input_entries, $case_sensitive );
close $handle_input_entries;

open my $handle_output_weights, '>', $filepath_output_weights or die "cannot open '$filepath_output_weights': $!";
print $handle_output_weights 'SIGN', 'NAME', 'QTH', 'WEIGHT';
_output_weights( $handle_output_weights, $weights );
close $handle_output_weights;

return 1;
}

#
sub _phase1
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

# assign configuration to variables.
my $filepath_input_bonuses = $configuration->{filepath_input_bonuses};
my $filepath_input_categories = $configuration->{filepath_input_categories};
my $filepath_input_weights = $configuration->{filepath_input_weights};
my $filepath_input_entries = $configuration->{filepath_input_entries};
my $filepath_output_errors = $configuration->{filepath_output_errors};
my $filepath_output_scores = $configuration->{filepath_output_scores};
my $case_sensitive = $configuration->{case_sensitive};
my $bands = $configuration->{bands};
my $wtf_threshold = $configuration->{wtf_threshold};
my $points = $configuration->{points};
my $points_bonus = $configuration->{points_bonus};
my $default_wtf = $configuration->{default_wtf};
my $default_category = $configuration->{default_category};

#
_continue( "'$filepath_input_weights' empty, do you really want to run phase1 now" ) if ! ( stat $filepath_input_weights )[7];
_continue( "'$filepath_output_errors' not empty, do you really want to (re)run phase1" ) if ( stat $filepath_output_errors )[7];
_continue( "'$filepath_output_scores' not empty, do you really want to (re)run phase1" ) if ( stat $filepath_output_scores )[7];

open my $handle_input_bonuses, '<', $filepath_input_bonuses or die "cannot open '$filepath_input_bonuses': $!";
my $bonuses = _input_bonuses( $handle_input_bonuses, $case_sensitive );
close $handle_input_bonuses;

open my $handle_input_categories, '<', $filepath_input_categories or die "cannot open '$filepath_input_categories': $!";
my $categories = _input_categories( $handle_input_categories, $case_sensitive );
close $handle_input_categories;

open my $handle_input_weights, '<', $filepath_input_weights or die "cannot open '$filepath_input_weights': $!";
<$handle_input_weights>; # discard headings.
my $weightsb = _input_weightsb( $handle_input_weights, $case_sensitive );
close $handle_input_weights;

open my $handle_input_entries, '<', $filepath_input_entries or die "cannot open '$filepath_input_entries': $!";
my $entries = _input_entries( $handle_input_entries, $categories, $weightsb, $case_sensitive, $bands, $wtf_threshold, $default_wtf, $default_category );
close $handle_input_entries;

open my $handle_output_errors, '>', $filepath_output_errors or die "cannot open '$filepath_output_errors': $!";
print $handle_output_errors 'LOGCALL', 'CALLWKD', 'BAND', 'TIME', 'NAME', 'QTH', 'ERROR', 'WTF';
_calculate_scores_and_output_errors( $handle_output_errors, $entries, $bonuses, $wtf_threshold, $points, $points_bonus );
close $handle_output_errors;

open my $handle_output_scores, '>', $filepath_output_scores or die "cannot open '$filepath_output_scores': $!";
print $handle_output_scores 'CAT', 'LOGCALL', 'SCORE', 'NAME', 'QTH';
_output_scores( $handle_output_scores, $entries );
close $handle_output_scores;

return 1;
}

#
sub _input_weights
{
my ( $handle_input_entries, $case_sensitive ) = @_;

my $weights = { };
my $weights_ = { };

while ( my $line = <$handle_input_entries> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $log_sign, $log_name, $log_qth, $call_sign, $call_name, $call_qth ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[5..10];

# construct log / call snq.
my $log_snq = join ${,}, $log_sign, $log_name, $log_qth;
my $call_snq = join ${,}, $call_sign, $call_name, $call_qth;

#
$weights->{log }->{$log_snq }->{$log_sign}++;
$weights->{call}->{$call_snq}->{$log_sign}++;
}

for my $log_snq ( keys %{$weights->{log}} )
{
my $log_wtf = sum( values %{$weights->{log}->{$log_snq}} );

$weights_->{$log_snq} = $log_wtf;
}

for my $call_snq ( keys %{$weights->{call}} )
{
my $call_wtf = scalar keys %{$weights->{call}->{$call_snq}};

$weights_->{$call_snq} += $call_wtf;
}

#print Dumper $weights, $weights_;

return $weights_;
}

#
sub _input_bonuses
{
my ( $handle_input_bonuses, $case_sensitive ) = @_;

my $bonuses = { };

while ( my $line = <$handle_input_bonuses> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

#
my ( $sign ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

warn 'duplicate' if defined $bonuses->{$sign};

#
$bonuses->{$sign} = 1;
}

#print Dumper $bonuses;

return $bonuses;
}

#
sub _input_categories
{
my ( $handle_input_categories, $case_sensitive ) = @_;

my $categories = { };

while ( my $line = <$handle_input_categories> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

#
my ( $sign, $category ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

warn 'duplicate' if defined $categories->{$sign};

#
$categories->{$sign} = $category;
}

#print Dumper $categories;

return $categories;
}

#
sub _input_weightsb
{
my ( $handle_input_weights, $case_sensitive ) = @_;

my $weightsb = { };

while ( my $line = <$handle_input_weights> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $sign, $name, $qth, $wtf ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
split( ' ', $line );

#
my $snq = join ${,}, $sign, $name, $qth;

#
$weightsb->{$snq} =
{
sign => $sign,
name => $name,
qth => $qth,
wtf => $wtf,
};
}

#print Dumper $weightsb;

return $weightsb;
}

#
sub _input_entries
{
my ( $handle_input_entries, $categories, $weightsb, $case_sensitive, $bands, $wtf_threshold, $default_wtf, $default_category ) = @_;

my $entries = { };

# process.
while ( my $line = <$handle_input_entries> )
{
# Ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $frequency, $call_time, $log_sign, $log_name, $log_qth, $call_sign, $call_name, $call_qth ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

my $log_snq = join ${,}, $log_sign, $log_name, $log_qth;
my $log_wtf = $weightsb->{$log_snq}->{wtf} // $default_wtf;

next if $log_wtf < $wtf_threshold;

# lookup band via frequency.
my $band = $bands->{( $frequency =~ /([1-9])/ )[0]}; # todo: // 'other' / error.

my $log_category = $categories->{$log_sign} // $default_category;
my $log_calls = $entries->{$log_sign}->{bands}->{$band} //= [ ]; # use //= to allow autovivification / assign default value.

my $call_snq = join ${,}, $call_sign, $call_name, $call_qth;
my $call_wtf = $weightsb->{$call_snq}->{wtf} // $default_wtf;
my $call_duplicate = ( grep { $_->{sign} eq $call_sign } @$log_calls ) ? 1 : 0 ;
#my $call_return = undef; # cannot do yet, not until every call call has been pushed.

#
_construct_log_entry( $entries, $log_category, $log_sign, $log_name, $log_qth );

#
_construct_call_entry( $log_calls, $call_time, $call_sign, $call_name, $call_qth, $call_wtf, $call_duplicate );
}

# process remainder that have ok wtf. Technically namespace not log or call specific, but constructs log entry.
for my $log_snq ( keys %$weightsb )
{
my $log_wtf = $weightsb->{$log_snq}->{wtf} // $default_wtf;

next if $log_wtf < $wtf_threshold;

my $log_sign = $weightsb->{$log_snq}->{sign};
my $log_name = $weightsb->{$log_snq}->{name};
my $log_qth = $weightsb->{$log_snq}->{qth};
my $log_category = $categories->{$log_sign} // $default_category;

#
_construct_log_entry( $entries, $log_category, $log_sign, $log_name, $log_qth );
}

#print Dumper $entries;

return $entries;
}

#
sub _construct_log_entry
{
my ( $ref, $log_category, $log_sign, $log_name, $log_qth ) = @_;

#
$ref->{$log_sign}->{category} //= $log_category;
$ref->{$log_sign}->{names}->{$log_name} = 1;
$ref->{$log_sign}->{qths }->{$log_qth } = 1;
$ref->{$log_sign}->{bands} //= { };
$ref->{$log_sign}->{bonuses} //= { };
$ref->{$log_sign}->{score} //= 0;

return 1;
}

#
sub _construct_call_entry
{
my ( $ref, $call_time, $call_sign, $call_name, $call_qth, $call_wtf, $call_duplicate ) = @_;

#
push @$ref,
{
time => $call_time,
sign => $call_sign,
name => $call_name,
qth => $call_qth,
wtf => $call_wtf,
duplicate => $call_duplicate,
};

return 1;
}

#
sub _output_weights
{
my ( $handle_output_weights, $weights ) = @_;

for my $snq ( sort keys %$weights )
{
my $wtf = $weights->{$snq};

# print.
print $handle_output_weights $snq, $wtf; # important that snq is $, divided.
}

return 1;
}

#
sub _calculate_scores_and_output_errors
{
my ( $handle_output_errors, $entries, $bonuses, $wtf_threshold, $points, $points_bonus ) = @_;

for my $log_sign ( sort keys %$entries )
{
my $log = $entries->{$log_sign};

my $log_bands = $log->{bands};
my $log_bonuses = $log->{bonuses};

for my $band ( sort keys %$log_bands )
{
my $log_calls = $log_bands->{$band}; # // [ ];

for my $call ( sort { $a->{sign} cmp $b->{sign} || $a->{time} <=> $b->{time} } @$log_calls )
{
my $call_time = $call->{time};
my $call_sign = $call->{sign};
my $call_name = $call->{name};
my $call_qth = $call->{qth};
my $call_wtf = $call->{wtf};
my $call_duplicate = $call->{duplicate};
my $call_calls = ( exists $entries->{$call_sign} ) ? $entries->{$call_sign}->{bands}->{$band} : [ ]; # use condition to prevent autovivification.
my $call_return = ( grep { $_->{sign} eq $log_sign } @$call_calls ) ? 1 : 0 ;

# validate call.
my ( $call_error, $call_wtf_string ) = ( $call_duplicate ) ? ( 'DUPE' , $call_wtf ) :
( $call_wtf < $wtf_threshold ) ? ( 'CNQ' , "$call_wtf<$wtf_threshold" ) :
( not $call_return ) ? ( 'NORET', $call_wtf ) :
( undef , undef ) ;

# log errors or update score.
if ( defined $call_error )
{
# print.
print $handle_output_errors $log_sign, $call_sign, $band, $call_time, $call_name, $call_qth, $call_error, $call_wtf_string;
}
# todo: better if scoring handled in own function or by _output_scores.
elsif ( exists $bonuses->{$call_sign} and not exists $log_bonuses->{$call_sign} )
{
$log->{score} += $points + $points_bonus;

$log_bonuses->{$call_sign} = 1;
}
else
{
$log->{score} += $points;
}
}
}
}

return 1;
}

#
sub _output_scores
{
my ( $handle_output_scores, $entries ) = @_;

for my $log_sign ( sort { $entries->{$a}->{category} <=> $entries->{$b}->{category} || $entries->{$b}->{score} <=> $entries->{$a}->{score} } keys %$entries )
{
my $log = $entries->{$log_sign};

my $log_category = $log->{category};
my $log_names = [ keys %{$log->{names}} ];
my $log_qths = [ keys %{$log->{qths }} ];
my $log_score = $log->{score};

# print.
print $handle_output_scores $log_category, $log_sign, $log_score, "@$log_names", "@$log_qths";
}

return 1;
}


Regards,

Chris


(This post was edited by Zhris on Apr 3, 2015, 5:42 PM)
Attachments: contestcrosschecker.zip (18.9 KB)


Edit Log:
Post edited by Zhris (Enthusiast) on Apr 3, 2015, 5:42 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives