Feb 16, 2011, 5:08 AM
Post #2 of 2
If I understand you right, you want to extract from a file only those characters with certain properties. You seem to classify the characters into "visible characters" and "unknown characters". Right?
Re: [britantyo] unknown character parsing problem
[In reply to]
Maybe you want to do something similar to the 'strings' utility?
So, basically, the question boils down to how to distinguish a "visible character" (to stay with your terminology) from an "unknown character", right?
Of course this means that you need to make clear, what exactly is a "visible character", respectively a "unknown" one. For instance (and ignoring Unicode issues for a moment), how would you classify the character with a hexadecimal representation of 0x88?
Actually, Perl comes with support for POSIX character classes, so if your idea of character visibility matches that of POSIX, the regular expression /[[:print:]]/ would match a "printable" character. See