CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Catch Error Message

 



agentbb007
New User

Jun 20, 2018, 2:18 PM

Post #1 of 6 (2520 views)
Catch Error Message Can't Post

Hi
My script is reading through a text file line by line and may come across characters that are not UTF8 encoded properly and will get this message:

UTF-8 "\x97" does not map to Unicode at Level1QC.pl line 225, <$fh> line 25 (#1)
(S utf8) When reading in different encodings, Perl tries to
map everything into Unicode characters. The bytes you read
in are not legal in this encoding.


My question is it possible to do something if it runs into this issue. I would like to move this file to a specific location if it contains bad UTF-8 characters. Here's the loop I'm using to read the file. The error is on the line while (my $line = <$fh>) Right now the script just ignores this line and moves on to the next line. I could even read the file beforehand to check for invalid characters if easier.

Code
open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; 
while ( my $line = <$fh> ) {
print "doing all my stuff here";
}


Thanks!


Laurent_R
Veteran / Moderator

Jun 20, 2018, 3:06 PM

Post #2 of 6 (2512 views)
Re: [agentbb007] Catch Error Message [In reply to] Can't Post

Hmm, there may be (or there ought to be) a better way, but you could possibly make the waning fatal (with the Fatal module) and then catch the exception (e.g. with Try::Tiny), and take it from there to more your file around.


Zhris
Enthusiast

Jun 20, 2018, 4:21 PM

Post #3 of 6 (2504 views)
Re: [agentbb007] Catch Error Message [In reply to] Can't Post

I like Laurent's idea, but I would prefer to locally modify the SIG hash to use a custom warnings handler. This way I can build a hash of bad files perhaps listing the bad line numbers, without effecting other warnings. These can be handled later, or a short circuiting system could be implemented via the likes of Try::Tiny. Along the lines of:


Code
use strict; 
use warnings;
use Data::Dumper;

$| = 1;
binmode STDOUT, ':utf8';

my $filepaths =
[ qw/
one.txt
two.txt
three.txt
/ ];

my $warnings = { };

for my $filepath ( @$filepaths )
{
local $SIG{__WARN__} = sub { utf8_warn( $filepath, $warnings, @_ ) };

open my $filehandle, '<:encoding(UTF-8)', $filepath or die "cannot open '$filepath': $!";

while ( my $line = <$filehandle> )
{
# ...
}

close $filehandle;
}

print Dumper $warnings;

sub utf8_warn
{
my ( $filepath, $warnings, $message ) = @_;

if ( $message =~ /does not map to unicode/i )
{
my ( $line ) = $message =~ m/>\s*line\s*(\d+)\.$/; # your match may differ.
$line++;

$warnings->{$filepath}->{$line}++;
}
else
{
warn $message;
}

return 1;
}


Chris


(This post was edited by Zhris on Jun 20, 2018, 4:49 PM)


agentbb007
New User

Jun 20, 2018, 5:17 PM

Post #4 of 6 (2487 views)
Re: [Zhris] Catch Error Message [In reply to] Can't Post

This is awesome thanks so much for the reply Laurent and Chris. I will try to implement this tomorrow and let you know how it goes!


Zhris
Enthusiast

Jun 20, 2018, 6:24 PM

Post #5 of 6 (2485 views)
Re: [agentbb007] Catch Error Message [In reply to] Can't Post

No problem,

In case its useful to you, here is an example more along the lines of Laurent's suggestion of trapping fatal warnings. It simply slurps the entire file ( easily modified to buffer if too large ), then pushes good and bad filepaths into individual arrays:


Code
use strict; 
use warnings;
use Try::Tiny;
use Data::Dumper;

$| = 1;
#binmode STDOUT, ':utf8';

my $filepaths_check =
[ qw/
one.txt
two.txt
three.txt
/ ];

my $filepaths_ok = [ ];
my $filepaths_nok = [ ];

{
use warnings FATAL => qw/utf8/;
local $/ = undef;

for my $filepath ( @$filepaths_check )
{
open my $filehandle, '<:encoding(UTF-8)', $filepath or die "cannot open '$filepath': $!";

try
{
<$filehandle>;
}
catch
{
die unless /does not map to unicode/i;
}
finally
{
my $ref = @_ ? $filepaths_nok : $filepaths_ok ;
push @$ref, $filepath;

close $filehandle;
}
}
}

print Dumper $filepaths_ok, $filepaths_nok;


Another option is to use Encode to try to decode utf8 sequences, errors can again be trapped and managed.

Chris


(This post was edited by Zhris on Jun 21, 2018, 5:03 AM)


agentbb007
New User

Jun 21, 2018, 1:09 PM

Post #6 of 6 (2468 views)
Re: [Zhris] Catch Error Message [In reply to] Can't Post

I used the modified SIG hash method and it's working great!! Thank you again so much for helping me out!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives