CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: DBI:
WHERE has changed to Numeric Comparison - SOLVED (or, at least a good work-around)

 



jhumkey
Novice

Jan 21, 2015, 7:02 AM

Post #1 of 20 (22196 views)
WHERE has changed to Numeric Comparison - SOLVED (or, at least a good work-around) Can't Post

Old DBD:DBI:CSV program on a Linux host has been working for (15+?) years. Now need to port it to a newer AIX box.

The old (working) WHERE clause, . . . if it saw "5151" for a key, ONLY matched/updated "5151" in the master. But the latest CSV modules on the AIX host . . . "5151" matches "5151", but it ALSO (wrongly) matches "0005151".

Its as if the functionality of the WHERE comparison has changed from String/Textual to Numerical comparison.

I can't find (and I've looked hard) a way to "declare" (as string to force string comparison) or "quote" to force string comparison.

Has anyone found a way around this issue?

For an original/Master table like :

MasterTable
0005151,20150121085200,1000,0
5151,20150121085200,2000,0

An update entry like :

Update Entry
5151,20150121085200,22154,0

Should update ONLY the 5151 line in the master. (And on the old Linux app it does.) Like :

MasterTable
0005151,20150121085200,1000,0
5151,20150121085200,22154,0

But on the new AIX (newer DBD/DBI/CSV modules) . . . it updates both lines, like :

MasterTable
0005151,20150121085200,22154,0
5151,20150121085200,22154,0

Which is wrong (or at least very undesirable). Its as if the comparison has flipped over the years from literal string, to numerical and it sees 0005151 as equal to 5151.

$XMASTER_db->do("UPDATE XMASTER \
SET DateTime='$DateTime', LifeHours='$LifeHours' \
WHERE ( SerialNumber='$SerialNumber' AND \
LifeHours<'$LifeHours' AND \
DateTime<'$DateTime' ) ");

On the old-original code, it was escaped double quotes (like \" ) around the \"$SerialNumber\"

Had to switch to single quotes '$SerialNumber' on the new host/modules or I get stupid errors (like "Can't find column 5151").

Is there any way to FORCE String/Textual and non Numerical comparisons in a WHERE?

Thanks,

jkh


(This post was edited by jhumkey on Jan 23, 2015, 7:21 AM)


FishMonger
Veteran / Moderator

Jan 21, 2015, 7:45 AM

Post #2 of 20 (22193 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

I've never heard of the DBD:DBI:CSV module and it's not on cpan.

Which DBD module are you using? If it's not on cpan, then post a link to where we can find it.

Please post a short but complete script that demonstrates your problem so that we can test it and suggest how to fix it. And, post it within code tags so that the formatting (indentation) will be retained.


jhumkey
Novice

Jan 21, 2015, 8:15 AM

Post #3 of 20 (22191 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Sorry if the concatenation confused you. I meant we're using DBI and DBD for CSV files. (All from cpan. Or from IBM's approved AIX version. So they may be behind actual CPAN.)

These modules loaded :
DBD-CSV-0_1027_tar.gz
DBI-1.622_901
SQL-Statement-0_1020_tar.gz
Text-CSV_XS-0_22_tar.gz
DBI-1_19_tar.gz
DBI-1.622_901.tar.gz
Text-CSV-0_01_tar.gz
Time-modules-2002_1001_tar.gz

The gist of the program is . . . sweep HoursIn file entry lines. If a matching key EXISTS in MasterFile then UPDATE it, if not, then INSERT it.

I've gutted the much longer original program but, this one runs/fails for an example.

Separate/create the three files, and change the $StartDir to wherever you are running it from.



Code
HoursIn 
6313,20150121095900,23147

MasterFile
0006313,20120524072600,1000
6313,20120120072500,2000

doupdate.pl
#!/usr/bin/perl

use DBI;
use strict;
use Cwd;

my ($StartDir);
my ($XMASTER_query, $XMASTER_st, $XMASTER_db);
my ($StartDir,$ProcessMask);
my ($IndiFile);
my ($HOURS_query, $HOURS_st, $HOURS_db);
my ($SerialNumber, $DateTime, $LifeHours);
my ($cYear, $cMon, $cDay, $cHour, $cMin, $cSec);

$XMASTER_db = DBI->connect("DBI:CSV:f_dir=$StartDir")
or die "Cannot connect: " . $DBI::errstr;

$XMASTER_db->{'csv_tables'}->{'XMASTER'} = {
'eol' => "\n", 'sep_char' => ",", 'quote_char' => undef,
'escape_char' => undef, 'file' => 'MasterFile', # <-the CSV file
'col_names' =>
[ "SerialNumber", "DateTime", "LifeHours" ]
};

$StartDir="/home/mes_cron";
$IndiFile="HoursIn";

$HOURS_db = DBI->connect("DBI:CSV:f_dir=$StartDir")
or die "Cannot connect: " . $DBI::errstr;

$HOURS_db->{'csv_tables'}->{'HOURS'} = {
'eol' => "\n", 'sep_char' => ",", 'quote_char' => undef,
'escape_char' => undef, 'file' => $IndiFile, # <-the CSV file
'col_names' =>
[ "SerialNumber", "DateTime", "LifeHours" ]
};

$HOURS_query = "SELECT SerialNumber,DateTime,LifeHours FROM HOURS";

$HOURS_st = $HOURS_db->prepare($HOURS_query);
$HOURS_st->execute();
$HOURS_st->bind_columns(undef, \$SerialNumber, \$DateTime, \$LifeHours);

while ($HOURS_st->fetch)
{
&insert_or_update( $SerialNumber, $DateTime, $LifeHours );
}
$HOURS_st->finish();

$HOURS_db->disconnect();

$XMASTER_db->disconnect();

exit(0);

#------------------------------------------------------------------

sub insert_or_update {
my ($SerialNumber, $DateTime, $LifeHours ) = @_;

if ( &exists_in_master( $SerialNumber ) )
{
update_master($SerialNumber,$DateTime,$LifeHours);
}
else
{
insert_master($SerialNumber,$DateTime,$LifeHours);
}

}

#------------------------------------------------------------------

sub insert_master {
my ($SerialNumber, $DateTime, $LifeHours ) = @_;
my ($col_str, $val_str, $sql_str);

$col_str = "SerialNumber, DateTime, LifeHours";

$val_str = "\'$SerialNumber\', \'$DateTime\', \'$LifeHours\'";

$sql_str = "INSERT INTO XMASTER ( $col_str ) VALUES ( $val_str )";

$XMASTER_db->do($sql_str);
}

#------------------------------------------------------------------

sub update_master {
my ($SerialNumber, $DateTime, $LifeHours ) = @_;

$XMASTER_db->do("UPDATE XMASTER \
SET DateTime='$DateTime', LifeHours='$LifeHours' \
WHERE ( SerialNumber='$SerialNumber' AND \
LifeHours<'$LifeHours' AND \
DateTime<'$DateTime' ) ");
}

#------------------------------------------------------------------

sub exists_in_master {
my ($SerialNumber, $ActionType ) = @_;
my ($retval);

$XMASTER_query = "SELECT SerialNumber FROM XMASTER \
WHERE SerialNumber='$SerialNumber' ";

$XMASTER_st = $XMASTER_db->prepare($XMASTER_query);
$XMASTER_st->execute();
$XMASTER_st->bind_columns(undef, \$SerialNumber );

if ( $XMASTER_st->fetch )
{
$retval = 1;
}
else
{
$retval = 0;
}

$XMASTER_st->finish();

return($retval);
}

#------------------------------------------------------------------



(This post was edited by jhumkey on Jan 21, 2015, 8:50 AM)


FishMonger
Veteran / Moderator

Jan 21, 2015, 8:53 AM

Post #4 of 20 (22180 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

I don't have time right now to do full testing, but I do spot several problems from the start.

It's very poor perl coding practice to declare all of those vars at the beginning of the script. They should be declared in the smallest scopt that they require AND as close as possible to where they are first used.

Add this line to your list of use statements and fix each issue it points out.

Code
use warnings FATAL => qw(all);


The first issue will be:

Quote
"my" variable $StartDir masks earlier declaration in same scope at C:\test\jhumkey.pl line 9.


Then this one:

Quote
Use of uninitialized value $StartDir in concatenation (.) or string at C:\test\jhumkey.pl line 15.



FishMonger
Veteran / Moderator

Jan 21, 2015, 9:09 AM

Post #5 of 20 (22178 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

I'm about 98% sure that using sprintf will be the solution you need to fix the insert/update issue.


(This post was edited by FishMonger on Jan 21, 2015, 9:10 AM)


jhumkey
Novice

Jan 21, 2015, 9:25 AM

Post #6 of 20 (22175 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

You're right . . . but it doesn't apply here. In "the real" "much larger" program . . . its all "encapsulated" variables down in many more levels. There's a whole "recursion" layer to process multiple input files and apply them all to the Master, gutted from this simplified example. And there are MANY more fields on the real lines.

But. . . this smaller example runs, and fails, to show the issue.

And the big point to remember . . . it has WORKED for 15+ years (on the old Linux box). The question at hand, is why (the much larger version that I didn't upload) won't port over and do WHERE comparison's the same way on a newer AIX host.

---
To the other reply about "sprintf" . . . I saw some quote(q("static")) type things done in sprintf examples but . . . I'm not quoting fixed-static text, I'll be SELECT reading them into variables, then comparing the MasterFile column (SerialNumber) to the variable text in $SerialNumber. So . . . I wasn't sure what to change to make the WHERE work right with an sprintf.

Thanks for looking though.


FishMonger
Veteran / Moderator

Jan 21, 2015, 9:39 AM

Post #7 of 20 (22171 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Are you running the exact same script on the new box, or was it modified to make use of newer syntax and modules?

This may not be an option for you but could you import the files into a mysql database so that you won't have to deal with added syntax hoops needed for the csv files?

I'm tied up on a project of my own so I won't be able to do any testing until I get home later today.


jhumkey
Novice

Jan 21, 2015, 11:25 AM

Post #8 of 20 (22166 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

I started with the same script, but had to change all the \"$SerialNumber\" type references to be '$SerialNumber'.

Something about the new host/modules . . . didn't like the double quotes any more. It would give errors like "Column 6313 not found". (Well, that's because 6313 is a VALUE, and you shouldn't be looking for it as a column . . .)

Anyway. I thought more about "sprintf".

Instead of having Select place values into variables, and performing the UPDATE directly from the contents of the variables (which worked fine on the old host/modules), I added an intermediate step. Select into variables, print a STATIC STRING with the variables expressed, then call the Update passing the static string.

But . . . that still doesn't work.

The update_master changed to . . .


Code
sub update_master { 
my ($SerialNumber, $DateTime, $LifeHours ) = @_;
my ($newSQLstr);

$newSQLstr = sprintf ("UPDATE XMASTER \
SET DateTime='%s', LifeHours='%s' \
WHERE ( SerialNumber='%s' AND \
LifeHours<'%s' AND \
DateTime<'%s' ) ",
$DateTime, $LifeHours,
$SerialNumber,$LifeHours,$DateTime);
print "$newSQLstr\n";
$XMASTER_db->do($newSQLstr);
}

which shows the SQL being sent to update (via the print) as . . .

Code
UPDATE XMASTER 
SET DateTime='20150121095900', LifeHours='23147'
WHERE ( SerialNumber='6313' AND
LifeHours<'23147' AND
DateTime<'20150121095900' )


But . . . that still updates both the 6313 line AND (incorrectly) the 0006313 line.

I tried changing those single quotes to double quotes so the select would look like . . .


Code
UPDATE XMASTER 
SET DateTime="20150121095900", LifeHours="23147"
WHERE ( SerialNumber="6313" AND
LifeHours<"23147" AND
DateTime<"20150121095900" )


But that throws me back to the other error . . .

Code
DBD::CSV::db do failed: Can't find table containing column named '"6313"' at /us 
[for Statement "UPDATE XMASTER
SET DateTime="20150121095900", LifeHours="23147"
WHERE ( SerialNumber="6313" AND
LifeHours<"23147" AND
DateTime<"20150121095900" ) "] at .



jhumkey
Novice

Jan 21, 2015, 11:53 AM

Post #9 of 20 (22164 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Using "Placeholders and Bind" does not work either . . . Both lines update when only one should.

Code
sub update_master { 
my ($SerialNumber, $DateTime, $LifeHours ) = @_;
my ($newSQLstr);

$XMASTER_st = $XMASTER_db->prepare(qq{UPDATE XMASTER
SET DateTime= ? , LifeHours= ?
WHERE ( SerialNumber= ? AND
LifeHours< ? AND
DateTime< ? )
});
$XMASTER_st->execute( "$DateTime", "$LifeHours",
"$SerialNumber","$LifeHours","$DateTime");
}



FishMonger
Veteran / Moderator

Jan 21, 2015, 12:04 PM

Post #10 of 20 (22162 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Use the Data::Dumper module to inspect the value of $SerialNumber to see it it retained the leading 0 zeros. I suspect it didn't.


jhumkey
Novice

Jan 21, 2015, 12:47 PM

Post #11 of 20 (22159 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

OK, but in the example above that fails . . .

I am looking for the one without the leading zero's. (only 6316.)

The problem is that (internal to the WHERE, when its pulling rows from the file that I can't intercept and Data::Dump) . . . that's when it pulls the 0006313 from the file and falsely matches it to 6313 in $SerialNumber.

I could Data::Dumper $SerialNumber, but it is (and should be) 6313. I can't get to the row-by-row internal comparison value fetched from the file as part of the WHERE.

I'm pretty much to the point of saying . . . we're going to have to "regress" to some older CPAN modules . . . and hope we can find an older one that works. (Or . . . one that works right under AIX.)


FishMonger
Veteran / Moderator

Jan 21, 2015, 2:09 PM

Post #12 of 20 (22151 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

This is untested but I think it should work. I'ts been awhile since I used DBD::CSV

Code
sub update_master { 
my ($SerialNumber, $DateTime, $LifeHours ) = @_;
my $length = length($SerialNumber);

$XMASTER_db->do("UPDATE XMASTER
SET DateTime='$DateTime', LifeHours='$LifeHours'
WHERE ( SerialNumber = '$SerialNumber' AND
CHAR_LENGTH(SerialNumber) = $length AND
LifeHours < '$LifeHours' AND
DateTime < '$DateTime')"
);}


I doubt that you need to use those single quotes, but that would need to be tested.


FishMonger
Veteran / Moderator

Jan 21, 2015, 2:23 PM

Post #13 of 20 (22149 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

If you have a large number of inserts/updates, then I would rearrange your sql statements and drop the "do" statements. It is far more efficient to prepare the statement once using placeholders (prior to the loop) and only have the execute statement inside the loop.


jhumkey
Novice

Jan 21, 2015, 2:44 PM

Post #14 of 20 (22147 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Your previous post (comparing lengths) WORKS for the small data/small program version.
(Both tries, 6313 updates ONLY 6313 and 0006313 updates ONLY 0006313.)

Longer version (with real table names and extra columns) will take 15+ mins to run. And . . . old/new system comparison will have to run overnight.

But . . . I have high confidence the Length comparison will fix it.

Did not know I could "stage" the SQL in advance and just pass different data on each execute. I'll check there, and some other places and see if I can speed up the program otherwise.

I will post (hopefully final) update tomorrow, and if I can edit the top, mark as solved.


jhumkey
Novice

Jan 22, 2015, 8:11 AM

Post #15 of 20 (22138 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Amazing. Poor little guy's been running twelve years now. I'm finally ready to run the overnight head-to-head test old system vs new.

Someone powered down and yanked the wrong box in the Datacenter just after I left yesterday . . . guess which one . . . Shocked

So, I'll try the test again tonight.


FishMonger
Veteran / Moderator

Jan 22, 2015, 8:34 AM

Post #16 of 20 (22134 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Before doing the head-to-head test, I would highly recommend profiling the script on the new system. The scaled down code you posted has a number of inefficiencies which will only get worse when scaled up.

Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
http://search.cpan.org/~timb/Devel-NYTProf-5.06/lib/Devel/NYTProf.pm


jhumkey
Novice

Jan 23, 2015, 7:37 AM

Post #17 of 20 (22120 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Marked as SOLVED.

OK, I still claim there is an underlying flaw, either the modules changed over time (or the AIX version was tweaked by IBM and is flawed) and where CSV "WHERE" clauses previously did String Comparisons, now they do Numeric Comparisons, with undesirable results.

However, the string length of "0006313" does not match the string length of "6313" even if ascii-to-integer conversion results in 6313 in both cases. So . . . your suggested addition of the LENGTH comparison of the key to the WHERE value being tested. works.

8512 rows in Master file, 1039 in Update file, head-to-head comparison old (12 year old Linux system runs in just over an hour) vs new (2 year old AIX system runs in 15 minutes) produces the same results.

So . . . even if the underlying problem exists (the comparison changed) the fix is solid.

As for efficiencies . . . As I'm sweeping through the input (new-changed values) file, I may encounter a new record that needs to be INSERTed into the Master, or (more often) encounter an existing record that needs to be UPDATEd into the Master.

I (now) understand I can "prepare" in advance, and only "execute" against value changes, and so not incur the "prepare" interpretation cost each time.

However, I'll be interweaving INSERTs with UPDATEs. So I'll need two "active" prepares. There doesn't seem to be a way to do that. Unless I open (connect) the file twice. If this were a real DB (Oracle/MySQL) performing the SQL execution in the background, I might've gone that route. But when its a Perl program updating a flat CSV file . . . I had fear two interior connects/opens wouldn't share-cooperate and update the flat file properly. I can test that.

Either way. Though its a valuable process, it need only run once a day. As long as it can finish in 23h:59s . . . 15minute run time should be fine.

This would probably never be an issue for a real DB in the background (Oracle/Postgress/MySQL) but for anyone else using CSV, maybe someone else will see this and have their fix too.

Thanks again for all your help.


FishMonger
Veteran / Moderator

Jan 23, 2015, 8:00 AM

Post #18 of 20 (22116 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

With that few of rows the script should be able to complete in less than a minute, unless there is additional heavy processing that needs to occur. But even then it should not take 15+ minutes.

There's an easy "in-between" solution which should improve the efficiency. Load the csv file into a mysql db via LOAD DATA INFILE Syntax. http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Then do your sql processing in mysql and output the the table to csv via SELECT ... INTO OUTFILE Syntax.
http://dev.mysql.com/doc/refman/5.1/en/select-into.html

The mysql table would need to be truncated prior to each import.

This approach would definitely be much faster than what you're currently doing. Possibly down to as low as 10 seconds.


(This post was edited by FishMonger on Jan 23, 2015, 8:04 AM)


FishMonger
Veteran / Moderator

Jan 23, 2015, 8:11 AM

Post #19 of 20 (22113 views)
Re: [jhumkey] WHERE has changed to Numeric Comparison [In reply to] Can't Post

Another approach which would be much faster than what you're doing would be to not do this via sql statements. Instead, load the master file into a HoH (Hash of Hashes) and then loop over the update file and do a simple hash assignment. You would not need to do the test which decides if it's an update or insert because the hash assignment would do auto vivification which creates the new hash entry if it didn't already exist.


jhumkey
Novice

Jan 23, 2015, 8:27 AM

Post #20 of 20 (22109 views)
Re: [FishMonger] WHERE has changed to Numeric Comparison [In reply to] Can't Post

All true and I will consider those.

I had even considered eliminating the MasterFile altogether, and just going direct from update files-rows directly to the output desired, but, I need to output ONLY the lines that updated "today" (and I can't stop their sending ALL entries from the plants, even the ones that didn't change today, and they mark all the ones they send today, with the same datetime stamp, whether they changed or not, and they're sending bursts of updates throughout the day) so I need a "memory" of the old value . . . hence a MasterFile of some sort must remain, for me to pass on to the next process, only those that "really" updated today.

This is probably the "final resting place" of a 15 year legacy project. We had much more data (more individual plants sending updates) and the overall project did many more things originally. At home, sure, I'd install MySQL or Progress or . . . whatever I needed. Here at work . . . I'm a "user" on those particular boxes without root or install ability so . . . CSV files were something we could utilize without 1000 layers of permission.

Thanks again.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives