CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
convert utf8 -> iso-8859-2

 



orange
Novice

Jan 15, 2013, 6:25 AM

Post #1 of 11 (1156 views)
convert utf8 -> iso-8859-2 Can't Post

how to convert string from utf8 to iso-8859-2 ?
thanks.


7stud
Enthusiast

Jan 15, 2013, 12:50 PM

Post #2 of 11 (1149 views)
Re: [orange] convert utf8 -> iso-8859-2 [In reply to] Can't Post


Code
 
use strict;
use warnings;
use 5.012;

my $in_name = 'data.txt';
my $out_name = 'results.txt';


open my $INFILE, "<:encoding(UTF-8)", $in_name
or die "Couldn't open $in_name: $!";

open my $OUTFILE, ">:encoding(iso-8859-2)", $out_name
or die "Couldn't open $out_name: $!";

while (my $line = <$INFILE>) {
print $OUTFILE $line;
}



orange
Novice

Jan 15, 2013, 4:29 PM

Post #3 of 11 (1143 views)
Re: [7stud] convert utf8 -> iso-8859-2 [In reply to] Can't Post

thanks, but I need to convert variable $string, not a file.
I have already tried:
$octets = encode("iso-8859-2", $string);

it gives question mark for 'unconvertable' characters. I need those (and only those) to be 'Unaccent'-ed


7stud
Enthusiast

Jan 15, 2013, 11:45 PM

Post #4 of 11 (1133 views)
Re: [orange] convert utf8 -> iso-8859-2 [In reply to] Can't Post


Code
use strict;  
use warnings;
use 5.012;

use Encode;

my $string = 'hello';
my $octets = encode("iso-8859-2", $string);

Works fine for me.


(This post was edited by 7stud on Jan 15, 2013, 11:48 PM)


orange
Novice

Jan 15, 2013, 11:57 PM

Post #5 of 11 (1127 views)
Re: [7stud] convert utf8 -> iso-8859-2 [In reply to] Can't Post

what about:
my $string='helloļ';


(This post was edited by orange on Jan 15, 2013, 11:58 PM)


7stud
Enthusiast

Jan 16, 2013, 2:02 AM

Post #6 of 11 (1122 views)
Re: [orange] convert utf8 -> iso-8859-2 [In reply to] Can't Post

Works fine.


(This post was edited by 7stud on Jan 16, 2013, 2:05 AM)


orange
Novice

Jan 16, 2013, 2:29 AM

Post #7 of 11 (1117 views)
Re: [7stud] convert utf8 -> iso-8859-2 [In reply to] Can't Post


Code
use strict;   
use warnings;
use 5.012;

use Encode;

my $string = 'helloļ';
my $octets = encode("iso-8859-2", $string);

print "string >$string< octets >$octets<";


it outputs:
string >helloļ< octets >hello??<

it should output:
string >helloļ< octets >helloi<


7stud
Enthusiast

Jan 16, 2013, 4:09 PM

Post #8 of 11 (1107 views)
Re: [orange] convert utf8 -> iso-8859-2 [In reply to] Can't Post

iso-8859-2 is a one byte encoding, i.e. all the characters can be represented by numbers in the range 0-255. What number does iso-8859-2 use to represent "small letter i with diaeresis(umlaut)" (that is what I am seeing in your string)?

Also, what is the output when you run this:


Code
my $string = 'helloļ';   
printf "%*vX", " ", $string;
print "\n";



(This post was edited by 7stud on Jan 16, 2013, 4:16 PM)


orange
Novice

Jan 17, 2013, 12:41 AM

Post #9 of 11 (1099 views)
Re: [7stud] convert utf8 -> iso-8859-2 [In reply to] Can't Post

it outputs:
68 65 6C 6C 6F C3 AF

but nevermind, I'll do manual search&replace
thanks.


7stud
Enthusiast

Jan 17, 2013, 11:52 AM

Post #10 of 11 (1094 views)
Re: [orange] convert utf8 -> iso-8859-2 [In reply to] Can't Post

How many letters are in the string, and how many bytes are in the output? How do you plan to jam two bytes into one byte and come up with a character that doesn't exist in iso-8859-2?


(This post was edited by 7stud on Jan 17, 2013, 11:53 AM)


orange
Novice

Jan 17, 2013, 1:27 PM

Post #11 of 11 (1091 views)
Re: [7stud] convert utf8 -> iso-8859-2 [In reply to] Can't Post

I planned to use Text::Unaccent but unfortunately it does all chars. checking them one by one wouldn't be elegant.
anyway problem is solved by regexp, as a friend suggested.
there aren't many problematic letters here. besides, I needed to convert smart quotes and such.


(This post was edited by orange on Jan 17, 2013, 1:28 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives