Home: Perl Programming Help: Beginner:
truncating variable with charset



orange
User

Mar 24, 2014, 12:47 AM


Views: 3057
truncating variable with charset

I'm using this:

Code
    my $lenn_uloge= length(Encode::encode_utf8($encitem)); 
my $tsize = 148;
while ($lenn_uloge > 148) {
$encitem2 = truncstr( $encitem, $tsize, "-");
$encitem = myEncode( $charset, $encitem2 );
$tsize-=1;
$lenn_uloge= length(Encode::encode_utf8($encitem));
}


but there must be a lot smarter way of doing it?
thanks.


BillKSmith
Veteran

Mar 24, 2014, 3:06 PM


Views: 3036
Re: [orange] truncating variable with charset

I am not familiar with the functions that you are using. Please post a complete working program so that we can run your example. This will tell us what modules you are using and allow us to read their documentation.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Mar 25, 2014, 11:07 AM


Views: 2899
Re: [orange] truncating variable with charset

Please also explain in plain words what exactly you are trying to do, this is not entirely obvious to me.


orange
User

Mar 25, 2014, 11:38 AM


Views: 2897
Re: [Laurent_R] truncating variable with charset

sorry, the complete program is big and in many files.
I am trying to reduce the size of string variable. there is a hard limit. problem is, string uses utf8 or something, so it might end with bad encoding if its simply cut.
so I stripped char by char, to get within limit.
thanks.


BillKSmith
Veteran

Mar 25, 2014, 9:17 PM


Views: 2863
Re: [orange] truncating variable with charset

This may be what you want. If not, please use it as an example of what I meant by a "complete program."

Code
use strict; 
use warnings;
use utf8;
my $encitem = '0123456789'x15;
utf8::encode $encitem;

$encitem = substr $encitem, 0, 148 if (length( $encitem ) > 148);

print $encitem;

Good Luck,
Bill


Laurent_R
Veteran / Moderator

Mar 26, 2014, 12:31 AM


Views: 2861
Re: [orange] truncating variable with charset

You would have to check that with your specific encoding, but if you use the utf8 module, the substr function should not cut your string in the middle of a multibyte character (at least, I think so, I did not try).


orange
User

Mar 31, 2014, 12:51 AM


Views: 2655
Re: [Laurent_R] truncating variable with charset

thanks Bill and others.
Bill, your program works, but not entirely as expected.
substring doesn't cut properly utf8, so I found this workaround:

Code
my $valid = Encode::decode('utf8', $sstr, Encode::FB_QUIET); 
$sstr = Encode::encode('utf8', $valid);


( http://stackoverflow.com/questions/10953069/perl-trim-utf8-bytes-to-length-and-sanitize-the-data)


(This post was edited by orange on Apr 7, 2014, 10:03 AM)