CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
recursively find non-ascii characters in file

 



romild0
Novice

Jan 7, 2009, 1:13 AM

Post #1 of 15 (1591 views)
recursively find non-ascii characters in file Can't Post

Hi!

I have some nasty, non-ascii character in some files that contains php code. What I want to do here is to recursively find all the files that contains a specific non-ascii character in the file. And most importantly - i need to know the name of that file.

So far, I found a script that looks into a file for non-ascii characters:


Code
while (<>) { 
s/([\x80-\xff])/sprintf "\\x{%02x}",ord($1)/eg;
print;
}


Ok, this is good, the non-ascii character that I'm looking for is:



Code
 
x{ef}\\x{bb}\\x{bf}


The problem here is that i can can't run this script to run recursively and I don't get the name of the file that contains this characters.

I've tried with bash, but since it's standard output, I can't get any resault on this. Here is what I've tried:



Code
 
find |xargs /usr/local/bin/check_for_non-ascii_characters.sh |grep -l 'x{ef}\\x{bb}\\x{bf}'


So, I need a way to recursively find non-ascii characters (a specific pattern, mentioned before) in all files and I need the name of the files containing it.

Thanks


(This post was edited by romild0 on Jan 7, 2009, 1:16 AM)


FishMonger
Veteran / Moderator

Jan 7, 2009, 4:12 AM

Post #2 of 15 (1581 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post

Why must it be recursive? Is it a requirement for a homework assignment?


romild0
Novice

Jan 7, 2009, 4:18 AM

Post #3 of 15 (1579 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

It's not for homework, It's a real-life example. I need to search the whole svn branch for files with this non-ascii characters. That's why it needs to be recursive.


FishMonger
Veteran / Moderator

Jan 7, 2009, 5:16 AM

Post #4 of 15 (1577 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post

use File::Find;
http://search.cpan.org/~nwclark/perl-5.8.9/lib/File/Find.pm

or

use IO::Dir::Recursive;
http://search.cpan.org/~flora/IO-Dir-Recursive-0.03/lib/IO/Dir/Recursive.pm

Personally, I'd use File::Find unless you prefer to write your own recursion sub that traverses the tree.


romild0
Novice

Jan 7, 2009, 5:21 AM

Post #5 of 15 (1575 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

Thanks for the tip. But more than I need a recursive method, I need a way to see what file contains non-ascii characters.


FishMonger
Veteran / Moderator

Jan 7, 2009, 5:52 AM

Post #6 of 15 (1572 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post

The find() subroutine of the File::Find module handles the recursion. You need to write the "wanted()" sub that is passed to find() by reference which processes each file that the find() sub returns.

Within your "wanted" sub, $_ holds the current filename and $File::Find::name includes the full path. You need to open and process that file.

Does that make sense? If not, then start by reading the doc for the module and post back if you need more clarification.


FishMonger
Veteran / Moderator

Jan 7, 2009, 6:11 AM

Post #7 of 15 (1568 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post

Here's another module you may want to look over.

File::Find::Rule
http://search.cpan.org/~rclamp/File-Find-Rule-0.30/lib/File/Find/Rule.pm


romild0
Novice

Jan 7, 2009, 6:20 AM

Post #8 of 15 (1566 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

thanks for the help, found a quick solution:



Code
#!/usr/bin/perl 
#
#

$filename = $ARGV[0];

while (<>) {
if ( /^\xef\xbb\xbf\x3c/ ) {
print "$filename Invalid code!\n";
`cp $filename /tmp/`;
next;
}



FishMonger
Veteran / Moderator

Jan 7, 2009, 6:26 AM

Post #9 of 15 (1564 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post


In Reply To
thanks for the help, found a quick solution:



Code
#!/usr/bin/perl 
#
#

$filename = $ARGV[0];

while (<>) {
if ( /^\xef\xbb\xbf\x3c/ ) {
print "$filename Invalid code!\n";
`cp $filename /tmp/`;
next;
}



That only processes a single file passed as an argument, which is not what you said you wanted i.e., there is no recursion in that code.


romild0
Novice

Jan 7, 2009, 6:30 AM

Post #10 of 15 (1562 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

You're right, It's not.

But this is:



Code
find /home/romild0/work/1.9/ -name *.php  -exec parse.pl {}  \;


cheers Wink


FishMonger
Veteran / Moderator

Jan 7, 2009, 6:49 AM

Post #11 of 15 (1559 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post

Here's a more "Perlish" approach that not only handles the recursion, but has proper error handling and is portable.


Code
#!/usr/bin/perl  

use strict;
use warnings;
use File::Find;
use File::Copy;

find(\&non_ascii, '/home/romild0/work/1.9');

sub non_ascii {
return unless $_ =~ /\.php$/;
open my $php, '<', $_ or warn "failed to open '$_' $!\n" and return;
while( my $line = <$php> ) {
if ( $line =~ /^\xef\xbb\xbf\x3c/ ) {
print "$File::Find::name Invalid code!\n";
cp($_, '/tmp/$_');
}
}
}



(This post was edited by FishMonger on Jan 7, 2009, 6:51 AM)


romild0
Novice

Jan 7, 2009, 6:50 AM

Post #12 of 15 (1556 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

damn! Why didn't you post that earlier Smile

Tnx anyways!


FishMonger
Veteran / Moderator

Jan 7, 2009, 6:56 AM

Post #13 of 15 (1554 views)
Re: [romild0] recursively find non-ascii characters in file [In reply to] Can't Post


In Reply To
damn! Why didn't you post that earlier Smile

Tnx anyways!

I know I'm going to get this quote wrong, but:

"Give a man a fish and he eats for a day.
Teach a man to fish and he can always eat"

I was hoping that you'd read the doc and learn how. Wink


FishMonger
Veteran / Moderator

Jan 7, 2009, 6:58 AM

Post #14 of 15 (1552 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

FWY, I forgot 1 line in the if block.


Code
while( my $line = <$php> ) {  
if ( $line =~ /^\xef\xbb\xbf\x3c/ ) {
print "$File::Find::name Invalid code!\n";
cp($_, '/tmp/$_');
return;
}
}



(This post was edited by FishMonger on Jan 7, 2009, 6:59 AM)


romild0
Novice

Jan 7, 2009, 7:00 AM

Post #15 of 15 (1550 views)
Re: [FishMonger] recursively find non-ascii characters in file [In reply to] Can't Post

To tell you the truth, it's my firs day with Perl, so this was a bit to much for me Smile

I choose Perl because grep lacks some functionality's that i needed to complete this task. I'm sure I'll be playing with Perl more often.

Cheers

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives