CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate: Re: [stuckinarut] HASH-O-RAMA Data Processing Problem: Edit Log



Zhris
Enthusiast

Feb 25, 2015, 10:03 PM


Views: 16562
Re: [stuckinarut] HASH-O-RAMA Data Processing Problem

Hi,

Your feedback has been valuable in helping me understand your thoughts. It would be cool to develop this as much as possible in order to eliminate as much manual checking as possible. Inevitably, if you had a list of contestants, the entire process could probably be automated.

As you have come to realize, in order to be able to check an entry against any other possible entry, we had to load all the data into a hash during the "first phase". The amount of memory this hash uses shouldn't be of concern unless it contained millions of entries. If it ever became a problem, there are tweaks that could be made to improve memory consumption.

Make certain that you try to cover every possible scenario in your sample input data, just in case we have missed something!

Also, if theres anything you don't understand from reading the code, feel free to raise your concerns, it may be necessary for you to one day make adjustments.

As per your post I have made the following changes:
- Replaced the data block with the adjusted data block you provided.
- Implemented sorting in numerous places.
- Added band and time fields to error log.
- Generated potential non submitters log, including weight field as this is important.

As per my own initiative, I have also made the following changes:
- Implemented an ignore hash. Any callsigns in this hash will be ignored. Once you have deciphered which logcalls belong to non submitters, you could insert them into this hash, then rerun the script.
- Fixed case sensitive map bug.

Perhaps still to do:
- If the ignore hash is handy, this could be constructed from another input file.
- At this time, only one possible error is logged against each entry in precedence as per the if / elsif conditions. You may wish to allow multiple possible errors, although there are discrepencies in doing so.
- Instead of an error log, you may wish to incoporate errors / mulligans into the final score ( QSOS ) for each logcall.
- Minor improvements to code, i.e. replace greps with List::*Util functions etc.
- Replace development filehandles with those pointing to your input and output logs.


Code
use strict; 
use warnings;
use Data::Dumper;

#local $/ = "\r\n";
local $, = "\t";
local $\ = "\n";

#####

# init.

my $case_sensitive = 0;

my $band_lookup =
{
3 => '80M',
7 => '40M',
};

my $ignore =
{
#W6NV => 1,
};

my $contestants = { };

my $potential_non_submitters = { };

#####

# fh open.

my $output_scores_str = '';
my $output_errors_str = '';
my $output_nonsub_str = '';

my $input_fh = \*DATA; # open my $input_fh, '<', 'listQ.txt' or die "cannot open 'listQ.txt': $!";
open my $output_scores_fh, '>', \$output_scores_str; # open my $output_scores_fh, '>', 'logscores.csv' or die "cannot open 'logscores.csv': $!";
open my $output_errors_fh, '>', \$output_errors_str; # open my $output_errors_fh, '>', 'logerrors.csv' or die "cannot open 'logerrors.csv': $!";
open my $output_nonsub_fh, '>', \$output_nonsub_str; # open my $output_nonsub_fh, '>', 'lognonsub.csv' or die "cannot open 'lognonsub.csv': $!";

# headings.
print $output_scores_fh 'LOGCALL', 'QSOS', 'MULT', 'NAME';
print $output_errors_fh 'LOGCALL', 'CALLSIGN', 'BAND', 'TIME', 'ERRORTYPES';
print $output_nonsub_fh 'LOGCALL', 'WEIGHT';

#####

# first phase ( load input data into hash ).

while ( my $line = <$input_fh> )
{
# remove any whitespace on the end of the line ( spaces, carriage return, newline ).
$line =~ s/\s+$//;

# split line, extracting only values we need into list and upper casing them if not case sensitive.
my ( $freq, $time, $logcall, $logname, $logmult, $callsign, $callname, $callmult ) =
map { $_ = uc $_ unless $case_sensitive; $_ }
( split( ' ', $line ) )[1,4..10];

# lookup band via frequency.
my $band = $band_lookup->{substr( $freq, 0, 1 )}; # todo: // error.

# add a new contestant to contestants if we haven't seen them before.
unless ( defined $contestants->{$logcall} )
{
$contestants->{$logcall} =
{
logname => $logname,
logmult => $logmult,
bands => { },
};
}

# add this entry to the contestants entries.
push @{$contestants->{$logcall}->{bands}->{$band}},
{
time => $time,
callsign => $callsign,
callname => $callname,
callmult => $callmult,
seen => 0,
errors => [ ],
};
}

#####

# second phase ( process hash, generate logs ).

for my $logcall ( sort keys %$contestants )
{
my $contestant = $contestants->{$logcall};

# instead of verified counter, could count number of error free entries before logging.
my $verified = 0;

for my $band ( sort keys %{$contestant->{bands}} )
{
my $entries = $contestant->{bands}->{$band};

for my $entry ( sort { $a->{callsign} cmp $b->{callsign} } @$entries )
{
# skip if in ignore list.
next if exists $ignore->{$entry->{callsign}};

# mark as seen. Used when checking for duplicate entries.
$entry->{seen} = 1;

# verify entry.
if ( not defined $contestants->{$entry->{callsign}} ) # invalid callsign.
{
push @{$entry->{errors}}, 'NIL(BANDQSO)';

$potential_non_submitters->{$entry->{callsign}}->{$logcall}++;
}
elsif ( ( grep { $_->{seen} and $_->{callsign} eq $entry->{callsign} } @$entries ) > 1 ) # duplicate entry.
{
push @{$entry->{errors}}, 'DUPE(BANDQSO)';
}
elsif ( $entry->{callname} ne $contestants->{$entry->{callsign}}->{logname} ) # invalid callname.
{
push @{$entry->{errors}}, 'INVALID(NAME)';
}
elsif ( $entry->{callmult} ne $contestants->{$entry->{callsign}}->{logmult} ) # invalid callmult.
{
push @{$entry->{errors}}, 'INVALID(MULT)';
}
elsif ( not grep { $_->{callsign} eq $logcall } @{$contestants->{$entry->{callsign}}->{bands}->{$band}} ) # no return entry.
{
push @{$entry->{errors}}, 'NIL(CALLSIGN)';

# todo: set "yes return entry" in relevant entry in $contestants->{$entry->{callsign}}->{bands}->{$band}, then don't have to pointlessly reverse check later.
}

# log errors if any, or increment verified count.
if ( @{$entry->{errors}} )
{
print $output_errors_fh $logcall, $entry->{callsign}, $band, $entry->{time}, @{$entry->{errors}};
}
else
{
$verified++;
}
}
}

# log score.
print $output_scores_fh $logcall, $verified, $contestant->{logmult}, $contestant->{logname};
}

#####

# third phase ( process potential non submitters hash, generate potential non submitters log ).

# reformat potential non submitters hash into callsign => count / weight. We incorporated contestants own logcall to ensure they can't skew
# the result if they consistantly use an invalid callsign via different bands or duplicate entries i.e. equal weighting / one logcall per callsign reported.
$_ = keys %$_ for ( values %$potential_non_submitters );

for my $callsign ( sort { $potential_non_submitters->{$b} <=> $potential_non_submitters->{$a} } keys %$potential_non_submitters )
{
my $weight = $potential_non_submitters->{$callsign};

# log potential non submitter.
print $output_nonsub_fh $callsign, $weight;
}

#####

# fh close.

close $input_fh;
close $output_scores_fh;
close $output_errors_fh;
close $output_nonsub_fh;

# print
#print Dumper $contestants;
#print Dumper $potential_non_submitters;
print $output_scores_str;
print $output_errors_str;
print $output_nonsub_str;

#####

__DATA__
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0200 W7WHY Tom OR N6ZFO BILL CA
QSO: 7040 CW 2015-01-22 0201 W7WHY Tom OR W9RE MIKE IL
QSO: 3542 CW 2015-01-22 0231 W7WHY Tom OR N6ZF BILL CA
QSO: 3540 CW 2015-01-22 0232 W7WHY Tom OR W6NV OLI CA
QSO: 3542 CW 2015-01-22 0246 W7WHY Tom OR W9RE MIKE IN
QSO: 7000 CW 2015-01-22 0201 W9RE MIKE IN W7WHY TOM Or
QSO: 7000 CW 2015-01-22 0221 W9RE MIKE IN N6ZFO BILL Ca
QSO: 3500 CW 2015-01-22 0231 W9RE MIKE IN N6ZFO BIL Ca
QSO: 3500 CW 2015-01-22 0246 W9RE MIKE IN W7WHY TOM Or
QSO: 3500 CW 2015-01-22 0249 W9RE MIKE IN W6NV OLI Ca
QSO: 7040 CW 2015-01-22 0201 N6ZFO BILL CA W7WHY TOM OR
QSO: 7040 CW 2015-01-22 0221 N6ZFO BILL CA W9RR MIKE IN
QSO: 7042 CW 2015-01-22 0222 N6ZFO BILL CA N2NL DAVE FL
QSO: 3543 CW 2015-01-22 0231 N6ZFO BILL CA W9RE MIKE IN
QSO: 3542 CW 2015-01-22 0231 N6ZFO BILL CA W7WHY TOM OR
QSO: 3544 CW 2015-01-22 0235 N6ZFO BILL CA W6NV OLI CA


Output:

Code
LOGCALL	QSOS	MULT	NAME 
N6ZFO 2 CA BILL
W7WHY 2 OR TOM
W9RE 2 IN MIKE

LOGCALL CALLSIGN BAND TIME ERRORTYPES
N6ZFO N2NL 40M 0222 NIL(BANDQSO)
N6ZFO W9RR 40M 0221 NIL(BANDQSO)
N6ZFO W6NV 80M 0235 NIL(BANDQSO)
N6ZFO W7WHY 80M 0231 NIL(CALLSIGN)
W7WHY N6ZFO 40M 0200 DUPE(BANDQSO)
W7WHY N6ZFO 40M 0200 DUPE(BANDQSO)
W7WHY W9RE 40M 0201 INVALID(MULT)
W7WHY N6ZF 80M 0231 NIL(BANDQSO)
W7WHY W6NV 80M 0232 NIL(BANDQSO)
W9RE N6ZFO 40M 0221 NIL(CALLSIGN)
W9RE N6ZFO 80M 0231 INVALID(NAME)
W9RE W6NV 80M 0249 NIL(BANDQSO)

LOGCALL WEIGHT
W6NV 3
N6ZF 1
W9RR 1
N2NL 1


Regards,

Chris


(This post was edited by Zhris on Feb 25, 2015, 10:47 PM)


Edit Log:
Post edited by Zhris (Enthusiast) on Feb 25, 2015, 10:13 PM
Post edited by Zhris (Enthusiast) on Feb 25, 2015, 10:15 PM
Post edited by Zhris (Enthusiast) on Feb 25, 2015, 10:27 PM
Post edited by Zhris (Enthusiast) on Feb 25, 2015, 10:43 PM
Post edited by Zhris (Enthusiast) on Feb 25, 2015, 10:46 PM
Post edited by Zhris (Enthusiast) on Feb 25, 2015, 10:47 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives