
1arryb
User
Oct 24, 2011, 9:59 AM
Post #1 of 1
(24539 views)
|
character encoding woes -- it goes in, but it won't come out
|
Can't Post
|
|
Hi Perl Gurus, I have a MySQL db that uses utf-8 character encoding. My schema has a "components" table that contains data scraped from web pages. Here's the DDL:
CREATE TABLE `components` ( `component_id` bigint(20) NOT NULL AUTO_INCREMENT, `name` varchar(127) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `title` varchar(127) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `url` varchar(1023) DEFAULT NULL, `description` text, `flag_mask` tinyint(4) DEFAULT '0', PRIMARY KEY (`component_id`), UNIQUE KEY `name` (`name`,`title`) ) ENGINE=InnoDB AUTO_INCREMENT=230052 DEFAULT CHARSET=utf8; I have allowed some non-utf-8 data to get in the table that I now cannot match:
#!/usr/bin/perl use strict; use warnings; use DBI; my $dbName = 'reflib_prod_candidate'; my $dbHost = 'collector-home'; my $dbUser = 'collector'; my $dbPass = '*******'; my $dsn = "DBI:mysql:database=$dbName;host=$dbHost"; my $dbh = DBI->connect($dsn, $dbUser, $dbPass, { AutoCommit => 1, RaiseError => 1}); die "Can't connect to the db" unless $dbh; # Select a component that I know has non-utf8 data in the title field. my $sql = "select name, title from components where name = 'xaf'"; my $result = $dbh->selectrow_arrayref($sql); die "no result" unless $result and scalar(@$result); print("name=$result->[0], title=$result->[1]\n"); # I cut and paste the title into a Perl string, but cannot use it # to match the database result. That's probably ok. print "can't match non-utf8 perl string by cut-and-paste!\n" unless $result->[1] eq "xaf\u9879\u76EE\u96C6\u5408(CRM,ERP,OA,SCM)--qq:651922596"; # I assign the result from the db to a Perl var. Whew! It does match itself in Perl, at least. my $title = $result->[1]; print "can't match non-utf8 perl string assigned to a local var!\n" unless $result->[1] eq $title; # However, the Perl string does NOT work as a SQL query. $sql = "select name, title from components where title='$title'"; $result = $dbh->selectrow_arrayref($sql); print "Can't match db non-utf8 data after retreival to a perl string\n" unless $result and scalar(@$result); Running this program against my database, I get:
$ perl tmp.pl name=xaf, title=xaf\u9879\u76EE\u96C6\u5408(CRM,ERP,OA,SCM)--qq:651922596 can't match non-utf8 perl string by cut-and-paste! Can't match db non-utf8 data after retreival to a perl string I have no use for this data since my application doesn't support non-utf8 character sets anyway. However, I do need to be able to manipulate it so I can clean it up. How do I pack/encode/escape a Perl string containing non-utf8 characters into a SQL query so that will match the same data in a utf8 MySQL database? Thanks! Larry
(This post was edited by 1arryb on Oct 24, 2011, 12:21 PM)
|