CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate: Re: [Zhris] HASH-O-RAMA Data Processing Problem: Edit Log



Zhris
Enthusiast

Feb 27, 2015, 8:25 PM


Views: 29524
Re: [Zhris] HASH-O-RAMA Data Processing Problem

Hi,

I have worked on this a little this evening and although have done little testing, I have come up with the below program in order to automate generating the list of contestants. There is no automate on or off option because every unique contestant is inserted into the resultant log with a status, being one of 'valid', 'manual' or 'invalid'. The algorithm is a little complex to explain in the little time I have right now, but if you test out different scenarios, I'm sure you will discover. Briefly:

- valid:
----- the operator had logged entries therefore had submitted their log, OR, other operators had logged entries more than a threshold number of times, weighted 1 unit per unique operator.
----- we discovered a single name and mult more frequently by weight from potentially other possiblities.
- manual:
----- the entry is probably valid, BUT, we discovered multiple names and/or mults with equal frequencies by weight, therefore we couldn't decipher which one(s) were correct. This is most likely the result of contestants who didn't submit their logs, but other contestants who called them made errors.
- invalid:
----- the entry wasn't valid, AND, there may be multiple names and/or mults with equal frequencies by weight.

Once this log is generated, you can go through manually and make adjustements as you see fit. Contestants marked 'manual' are basically valid, but you should adjust the name and mult values to a single name and mult (all possiblities are listed seperated by a pipeline), then adjust the status to 'valid', or if you really want 'invalid'. I would assume that when you run through real world data, there will be none to little manuals. You can leave 'invalid' contestants alone, but you may wish to double check them following the same update process you did for 'manual'. Phase two would ignore contestants with any status other than 'valid'.


Code
use strict; 
use warnings;
use List::MoreUtils qw/before/;
use Data::Dumper;

#####

# handle open.

my $strings = [ ];

my $handle_input_entries = _handle( \*DATA );
my $handle_output_contestants = _handle( $strings, '>' );

#####

# init.

my $configuration =
{
handle_input_entries => $handle_input_entries,
handle_output_contestants => $handle_output_contestants,
case_sensitive => 0,
band_lookup => { 3 => '80M', 7 => '40M' },
threshold => 2,
};

my $phases =
[
\&_phase1,
\&_phase2,
];

#####

# phase.

my $phase = 0; # $ARGV[0];

$phases->[$phase]->( $configuration );

#####

# handle close.

close $handle_input_entries;
close $handle_output_contestants;

{
local $, = "\n";

print @$strings;
}

#####

# functions.

# phase one.
sub _phase1
{
my ( $configuration ) = @_;

$configuration // die 'configuration required';

local $" = '|';
local $, = "\t";
local $\ = "\n";

my $handle_input_entries = $configuration->{handle_input_entries};
my $handle_output_contestants = $configuration->{handle_output_contestants};
my $case_sensitive = $configuration->{case_sensitive};
my $threshold = $configuration->{threshold};

my $contestants = { };

while ( my $line = <$handle_input_entries> )
{
# ignore blank or comment lines.
next if $line =~ m/^\s*(#|$)/;

# remove any whitespace on the end of the line ( spaces, carriage return, newline ). This is more intuative than chomp.
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $log_sign, $log_name, $log_mult, $call_sign, $call_name, $call_mult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# todo: validate line i.e. ensure each var has val.

# populate contestants hash.
#$contestants->{$log_sign }->{log }->{seen }->{$log_sign}++;
$contestants->{$log_sign }->{log }->{names}->{$log_name }->{$log_sign}++;
$contestants->{$log_sign }->{log }->{mults}->{$log_mult }->{$log_sign}++;
$contestants->{$call_sign}->{call}->{seen }->{$log_sign }++;
$contestants->{$call_sign}->{call}->{names}->{$call_name}->{$log_sign}++;
$contestants->{$call_sign}->{call}->{mults}->{$call_mult}->{$log_sign}++;
}

# print headings.
print $handle_output_contestants 'SIGN', 'NAME', 'MULT', 'STATUS';

for my $sign ( sort keys %$contestants )
{
my $contestant = $contestants->{$sign};

my $details_operator = keys %{$contestant->{log}} ? 'log' : 'call' ;

my $names = _details( $contestant->{$details_operator}->{'names'} );
my $mults = _details( $contestant->{$details_operator}->{'mults'} );

my $status = 'invalid';
if ( ( keys %{$contestant->{log}} and keys %{$contestant->{call}} ) or ( keys %{$contestant->{call}->{seen}} >= $threshold ) )
{
$status = ( @$names > 1 or @$mults > 1 ) ? 'manual' : 'valid' ;
}

# print line.
print $handle_output_contestants $sign, "@$names", "@$mults", $status;
}

return 1;
}

# phase two.
sub _phase2
{

}

# deals with multiple input / output handles in standalone programs.
sub _handle
{
my ( $expression, $mode, $divider ) = @_;

$expression // die 'expression required';
$mode //= '<';

my $handle = undef;
my $handles = [ ];

if ( ref $expression eq 'GLOB' )
{
$handle = $expression;
}
else
{
if ( ref $expression eq ref [ ] )
{
push @$expression, '';
$expression = \$expression->[-1];
}

open $handle, $mode, $expression or die "cannot open '$expression': $!";
}

if ( $mode eq '<' and defined $divider )
{
local $/ = $divider;

while ( my $block = <$handle> )
{
$block =~ s/\Q$divider\E$//;

open my $handle_b, $mode, \$block or die "cannot open '$block': $!";

push @$handles, $handle_b;
}
}
else
{
push @$handles, $handle;
}

return wantarray ? @$handles : $handles->[0] ;
}

# decipher most likely details by weight.
sub _details
{
my ( $details ) = @_;

$details // die 'details required';

my $hash = { };
$hash->{$_} += scalar( keys %{$details->{$_}} ) for ( keys %$details );

my $weight = undef;
my $list = [ before { $weight //= $hash->{$_}; $weight != $hash->{$_} } sort { $hash->{$b} <=> $hash->{$a} } keys %$hash ];

return $list;
}

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tommy OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3542 CW 2015-01-22 0231 W779 Tom OR N6ZF BILL CA
QSO: 3542 CW 2015-01-22 0231 W770 Tom OR N6ZF BIL CA
QSO: 3542 CW 2015-01-22 0231 W771 Tom OR N6ZF BIL CA
QSO: 3542 CW 2015-01-22 0231 W772 Tom OR N6ZF BI CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
#QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE ON
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKEY IF
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
#QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
#QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHN UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHN UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHNNY UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JOHNNY UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N777 JILL UK W6NV OLI CA
QSO: 3544 CW 2015-01-22 0235 N778 PETE UK W6NV OLI CA


Output:

Code
SIGN	NAME	MULT	STATUS 
N2NL DAVE FL invalid
N6ZF BILL|BIL CA manual
N6ZFO BILL CA valid
N777 JILL|JOHN|JOHNNY UK invalid
N778 PETE UK invalid
W6NV OLI CA valid
W770 TOM OR invalid
W771 TOM OR invalid
W772 TOM OR invalid
W779 TOM OR invalid
W7WHY TOMMY|TOM OR manual
W9RE MIKE IN valid
W9RR MIKE|MIKEY IF|ON|IN invalid


Regards,

Chris


(This post was edited by Zhris on Feb 27, 2015, 9:00 PM)


Edit Log:
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:25 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:26 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:27 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:29 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:30 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:31 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:31 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:32 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:34 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:37 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:39 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:40 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:46 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 8:47 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 9:00 PM
Post edited by Zhris (Enthusiast) on Feb 27, 2015, 9:00 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives