CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: DBI: character encoding woes -- it goes in, but it won't come out: Edit Log



1arryb
User

Oct 24, 2011, 9:59 AM


Views: 8464
character encoding woes -- it goes in, but it won't come out

Hi Perl Gurus,

I have a MySQL db that uses utf-8 character encoding. My schema has a "components" table that contains data scraped from web pages. Here's the DDL:

Code
CREATE TABLE `components` ( 
`component_id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(127) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`title` varchar(127) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`url` varchar(1023) DEFAULT NULL,
`description` text,
`flag_mask` tinyint(4) DEFAULT '0',
PRIMARY KEY (`component_id`),
UNIQUE KEY `name` (`name`,`title`)
) ENGINE=InnoDB AUTO_INCREMENT=230052 DEFAULT CHARSET=utf8;


I have allowed some non-utf-8 data to get in the table that I now cannot match:

Code
#!/usr/bin/perl  

use strict;
use warnings;

use DBI;

my $dbName = 'reflib_prod_candidate';
my $dbHost = 'collector-home';
my $dbUser = 'collector';
my $dbPass = '*******';

my $dsn = "DBI:mysql:database=$dbName;host=$dbHost";
my $dbh = DBI->connect($dsn, $dbUser, $dbPass, { AutoCommit => 1, RaiseError => 1});
die "Can't connect to the db" unless $dbh;

# Select a component that I know has non-utf8 data in the title field.
my $sql = "select name, title from components where name = 'xaf'";
my $result = $dbh->selectrow_arrayref($sql);
die "no result" unless $result and scalar(@$result);
print("name=$result->[0], title=$result->[1]\n");

# I cut and paste the title into a Perl string, but cannot use it
# to match the database result. That's probably ok.
print "can't match non-utf8 perl string by cut-and-paste!\n"
unless $result->[1] eq "xaf\u9879\u76EE\u96C6\u5408(CRM,ERP,OA,SCM)--qq:651922596";

# I assign the result from the db to a Perl var. Whew! It does match itself in Perl, at least.
my $title = $result->[1];
print "can't match non-utf8 perl string assigned to a local var!\n"
unless $result->[1] eq $title;

# However, the Perl string does NOT work as a SQL query.
$sql = "select name, title from components where title='$title'";
$result = $dbh->selectrow_arrayref($sql);
print "Can't match db non-utf8 data after retreival to a perl string\n"
unless $result and scalar(@$result);


Running this program against my database, I get:

Code
$ perl tmp.pl 
name=xaf, title=xaf\u9879\u76EE\u96C6\u5408(CRM,ERP,OA,SCM)--qq:651922596
can't match non-utf8 perl string by cut-and-paste!
Can't match db non-utf8 data after retreival to a perl string


I have no use for this data since my application doesn't support non-utf8 character sets anyway.
However, I do need to be able to manipulate it so I can clean it up.

How do I pack/encode/escape a Perl string containing non-utf8 characters into a SQL query so that will match the same data in a utf8 MySQL database?

Thanks!

Larry


(This post was edited by 1arryb on Oct 24, 2011, 12:21 PM)


Edit Log:
Post edited by 1arryb (User) on Oct 24, 2011, 10:02 AM
Post edited by 1arryb (User) on Oct 24, 2011, 10:14 AM
Post edited by 1arryb (User) on Oct 24, 2011, 10:14 AM
Post edited by 1arryb (User) on Oct 24, 2011, 10:15 AM
Post edited by 1arryb (User) on Oct 24, 2011, 10:17 AM
Post edited by 1arryb (User) on Oct 24, 2011, 10:19 AM
Post edited by 1arryb (User) on Oct 24, 2011, 12:15 PM
Post edited by 1arryb (User) on Oct 24, 2011, 12:21 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives