Home: Perl Programming Help: Beginner:
convert utf8 -> iso-8859-2



orange
User

Jan 15, 2013, 6:25 AM


Views: 3588
convert utf8 -> iso-8859-2

how to convert string from utf8 to iso-8859-2 ?
thanks.


7stud
Enthusiast

Jan 15, 2013, 12:50 PM


Views: 3581
Re: [orange] convert utf8 -> iso-8859-2


Code
 
use strict;
use warnings;
use 5.012;

my $in_name = 'data.txt';
my $out_name = 'results.txt';


open my $INFILE, "<:encoding(UTF-8)", $in_name
or die "Couldn't open $in_name: $!";

open my $OUTFILE, ">:encoding(iso-8859-2)", $out_name
or die "Couldn't open $out_name: $!";

while (my $line = <$INFILE>) {
print $OUTFILE $line;
}



orange
User

Jan 15, 2013, 4:29 PM


Views: 3575
Re: [7stud] convert utf8 -> iso-8859-2

thanks, but I need to convert variable $string, not a file.
I have already tried:
$octets = encode("iso-8859-2", $string);

it gives question mark for 'unconvertable' characters. I need those (and only those) to be 'Unaccent'-ed


7stud
Enthusiast

Jan 15, 2013, 11:45 PM


Views: 3565
Re: [orange] convert utf8 -> iso-8859-2


Code
use strict;  
use warnings;
use 5.012;

use Encode;

my $string = 'hello';
my $octets = encode("iso-8859-2", $string);

Works fine for me.


(This post was edited by 7stud on Jan 15, 2013, 11:48 PM)


orange
User

Jan 15, 2013, 11:57 PM


Views: 3559
Re: [7stud] convert utf8 -> iso-8859-2

what about:
my $string='helloļ';


(This post was edited by orange on Jan 15, 2013, 11:58 PM)


7stud
Enthusiast

Jan 16, 2013, 2:02 AM


Views: 3554
Re: [orange] convert utf8 -> iso-8859-2

Works fine.


(This post was edited by 7stud on Jan 16, 2013, 2:05 AM)


orange
User

Jan 16, 2013, 2:29 AM


Views: 3549
Re: [7stud] convert utf8 -> iso-8859-2


Code
use strict;   
use warnings;
use 5.012;

use Encode;

my $string = 'helloļ';
my $octets = encode("iso-8859-2", $string);

print "string >$string< octets >$octets<";


it outputs:
string >helloļ< octets >hello??<

it should output:
string >helloļ< octets >helloi<


7stud
Enthusiast

Jan 16, 2013, 4:09 PM


Views: 3539
Re: [orange] convert utf8 -> iso-8859-2

iso-8859-2 is a one byte encoding, i.e. all the characters can be represented by numbers in the range 0-255. What number does iso-8859-2 use to represent "small letter i with diaeresis(umlaut)" (that is what I am seeing in your string)?

Also, what is the output when you run this:


Code
my $string = 'helloļ';   
printf "%*vX", " ", $string;
print "\n";



(This post was edited by 7stud on Jan 16, 2013, 4:16 PM)


orange
User

Jan 17, 2013, 12:41 AM


Views: 3531
Re: [7stud] convert utf8 -> iso-8859-2

it outputs:
68 65 6C 6C 6F C3 AF

but nevermind, I'll do manual search&replace
thanks.


7stud
Enthusiast

Jan 17, 2013, 11:52 AM


Views: 3526
Re: [orange] convert utf8 -> iso-8859-2

How many letters are in the string, and how many bytes are in the output? How do you plan to jam two bytes into one byte and come up with a character that doesn't exist in iso-8859-2?


(This post was edited by 7stud on Jan 17, 2013, 11:53 AM)


orange
User

Jan 17, 2013, 1:27 PM


Views: 3523
Re: [7stud] convert utf8 -> iso-8859-2

I planned to use Text::Unaccent but unfortunately it does all chars. checking them one by one wouldn't be elegant.
anyway problem is solved by regexp, as a friend suggested.
there aren't many problematic letters here. besides, I needed to convert smart quotes and such.


(This post was edited by orange on Jan 17, 2013, 1:28 PM)