Home: Perl Programming Help: Beginner:
Replacing accented characters



newera
Novice

May 21, 2016, 5:41 AM


Views: 5444
Replacing accented characters

I have a script that needs to replace French characters with English equivalents to use in an ID number.

The ID is generated by taking the first and last names, then adding a system generated number.

This is the code I have but it doesn't work:


Code
  $ID = lc(substr($first_name, 0, 1)) . lc(substr($last_name, 0, 1)) . $NEW{'intid'};


Then do the substitution:

Code
         $ID =~ s//e/gi; 
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//a/gi;
$ID =~ s//a/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//i/gi;
$ID =~ s//o/gi;
$ID =~ s//c/gi;
$ID =~ s//e/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//a/gi;
$ID =~ s//a/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//i/gi;
$ID =~ s//o/gi;



BillKSmith
Veteran

May 21, 2016, 8:52 AM


Views: 5438
Re: [newera] Replacing accented characters

You do not say what does not work. Your code for ID does not match your description. The code uses only the first character of the first and last names. It converts them to lower case and appends the number. If this is what you intend, there is no need to include upper case in your regex. (You probably should use the translation operator (tr///) instead of all those regex.)

Please post a small, but complete script that demonstrates your problem.
Good Luck,
Bill


newera
Novice

May 21, 2016, 1:59 PM


Views: 5433
Re: [BillKSmith] Replacing accented characters

Yes, I need the first character of the first and last names, then append the number. The complete script is a large one.

When French affiliates join us, their names often have accented first characters. The code I have now does not substitute them the way it is.

If no accented first characters, the script works fine.


Zhris
Enthusiast

May 21, 2016, 4:11 PM


Views: 5428
Re: [newera] Replacing accented characters

Could you expand on "doesn't work", how are you determining this and what is the result.

I suspect all you need to do is use the utf8 pragma to tell the compiler that your code contains encoded utf8 characters.


Code
use utf8;


Chris


newera
Novice

May 21, 2016, 4:51 PM


Views: 5425
Re: [Zhris] Replacing accented characters

I do use

use utf8;

Doesn't work means no ID is generated because of the accents. No accents in the names, and it generates the ID.

Example:

Bruce Therrien

ID generated is bt2 (2 is the next incremental number)

Example 2:

ve Courneyer

ID should be ec3, but no ID is generated.


Zhris
Enthusiast

May 21, 2016, 5:05 PM


Views: 5422
Re: [newera] Replacing accented characters

Taking what code you have provided, I cannot replicate your issue:

http://codepad.org/GWdDjYdq

Code
use utf8; 

my $first_name = 've';
my $last_name = 'Courneyer';
my %NEW = ( 'intid' => 2 );

$ID = lc(substr($first_name, 0, 1)) . lc(substr($last_name, 0, 1)) . $NEW{'intid'};

$ID =~ s//e/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//a/gi;
$ID =~ s//a/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//i/gi;
$ID =~ s//o/gi;
$ID =~ s//c/gi;
$ID =~ s//e/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//a/gi;
$ID =~ s//a/gi;
$ID =~ s//u/gi;
$ID =~ s//e/gi;
$ID =~ s//i/gi;
$ID =~ s//o/gi;

print $ID;


As Bill suggested, could you post a short complete example that demonstrates your problem.

It is highly recommended that you use both the strict and warnings pragmas. They will often help you identify problems that would otherwise be difficult to discover manually. For example, without these pragmas, undefined values are printed as blanks.

Chris


(This post was edited by Zhris on May 21, 2016, 5:11 PM)


BillKSmith
Veteran

May 21, 2016, 8:37 PM


Views: 5403
Re: [newera] Replacing accented characters

We did not ask for your production script, but rather, the shortest script that you can make that demonstrates the problem. I cannot reproduce it.

Code
use strict; 
use warnings;
use utf8;
$\="\n";
my $first_name;
my $last_name;
my $ID;
my %NEW = ( 'intid' => 1 );

$NEW{intid}++;
$first_name = 'Bruce';
$last_name = 'Therrien';
$ID = lc(substr($first_name, 0, 1))
. lc(substr($last_name, 0, 1))
. $NEW{'intid'}
;
$ID =~ tr//eueaaueioc/;
print $ID;

$NEW{intid}++;
$first_name = 've';
$last_name = 'Courneyer';
$ID = lc(substr($first_name, 0, 1))
. lc(substr($last_name, 0, 1))
. $NEW{'intid'}
;
$ID =~ tr//eueaaueioc/;
print $ID;


OUTPUT:

Code
bt2 
ec3

Good Luck,
Bill


newera
Novice

May 22, 2016, 3:09 PM


Views: 5387
Re: [BillKSmith] Replacing accented characters

Bill's code works OK, it prints the correct ID to the screen before the rest of the script continues. The problem is occuring when the data is entered into the MySQL database.
Nothing is entered at all.
So I'll need to look at my code for database entry of the ID.


BillKSmith
Veteran

May 23, 2016, 12:24 PM


Views: 5373
Re: [newera] Replacing accented characters

My best guess is that you do not have a perl problem. Are you sure that your DB is configured to accept utf-8 data? Check its documentation. If everything seems ok, write a small program to INSERT one record of ASCII data and one of utf-8. (Do not expect Cris or I to do it for you again.) Did one work and not the other? Enable all error messages for perl and for the data base. Read them! If you still need help, post your example, your expected output, the actual output, and all error messages in the database section of this forum.
Good Luck,
Bill


newera
Novice

May 23, 2016, 4:40 PM


Views: 5368
Re: [BillKSmith] Replacing accented characters

I fixed it by adding the following code that I found with a little research:


Code
  $first_name = Encode::decode_utf8( $first_name ); 
$last_name = Encode::decode_utf8( $last_name );


Now I get both the ID and the names entered in the database correctly. Thanks for all the help.