CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Wrapper around the UNIX find | xargs grep

 



panicz
Novice

Jan 25, 2013, 3:29 PM

Post #1 of 11 (1678 views)
Wrapper around the UNIX find | xargs grep Can't Post

Hey.
I started having enough of typing find ./ -name *.ext |xargs grep "a string that interests me", so i decided to create a wrapper. My initial thought was to do it in perl, but since i've left my camel book many kilometers away at my parents' house, and got accustomed to the GUILE Scheme, i eventually used GUILE.
The idea was to write in bash

Code
search for "a string that interests me" \ 
or "some other string" in *.ext or *.txt

(instead of explicitly having to call xargs and grep through pipes).
I eventually made it within 42 lines of Scheme code, but if any additional features appear, i may extend the script.
If anyone's interested, the source is available here:
https://bitbucket.org/panicz/slayer/src/834d8b2c1a2efbe2cf58197d05b29d69b8126022/tools/search?at=goose-3d

I have been wondering, though, how would it be best to write such script in perl. Intuitively, perl seems the perfect language for that application, but to tell the truth, i wouldn't even know how to start... (the problem is the possible set of "or" expressions, which were handled moderately elegantly using pattern matching and recursion in Scheme)

Best regards,
PanicZ


Laurent_R
Veteran / Moderator

Jan 26, 2013, 2:10 AM

Post #2 of 11 (1655 views)
Re: [panicz] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

You can certainly use Perl to emulate the find command, and it should be fairly easy. Probably far less code lines than your Scheme implementation, but I don't understand Scheme well enough to understand in detail what you are doing in your script.

If I have some time, I might come back later with a basic proposal.


panicz
Novice

Jan 26, 2013, 3:00 AM

Post #3 of 11 (1653 views)
Re: [Laurent_R] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

The idea of the code is very simple:
it is invoked as

Code
search CLAUSE*

where

Code
CLAUSE ::= 
"from" $from
| "in" $in[0] ("or" @in)*
| "for" $for[0] ("or" @for)*

so for instance it can be called like that:

Code
# search in '*.php' for 'join' 
# search from ~/ for $USER
...

internally it runs

Code
system("find ".($from or "./")." " 
.(@in?" -name ".join(" -or -name ", @in):"")
.(@for?"|xargs grep -e ".join(" -e ", @for):""));

where @for and @in are arrays obtained from processing appropriate clauses. The only problem that i have is with processing those clauses.
It would be perfect if the source code was similar to what i wrote here, because i think it's the best way to explain what i mean.


(This post was edited by panicz on Jan 26, 2013, 1:32 PM)


BillKSmith
Veteran

Jan 27, 2013, 1:10 PM

Post #4 of 11 (1627 views)
Re: [panicz] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

Why even consider a "wrapper" in perl? the built-in functions glob and grep appear to do the original job. You probably want to extend the error processing, but
this should give you the idea.

Code
use strict; 
use warnings;
foreach my $file (glob( '*.ext' ), glob( '*.txt' ) ) {
open my $IN, '<', $file or next;
print( grep {/a string that interests me/} <$IN> )
}

Good Luck,
Bill


7stud
Enthusiast

Jan 27, 2013, 4:02 PM

Post #5 of 11 (1624 views)
Re: [BillKSmith] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

Well, the whole idea is to be able to type the following(or any variation of the allowed grammar) on the command line:


Code
$ search for "a string that interests me" \  
or "some other string" in *.ext or *.txt


...and have the program do what the English says. The op's second post complicates things even further by expanding on the allowed grammar. Writing a new program for every search is not what the op is after.


(This post was edited by 7stud on Jan 27, 2013, 5:10 PM)


panicz
Novice

Jan 28, 2013, 6:27 AM

Post #6 of 11 (1603 views)
Re: [7stud] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post


In Reply To
Writing a new program for every search is not what the op is after.


Yes, I'd find that rather odd, indeed :]
I decided to write a wrapper on UNIX commands find, xargs and grep, because I frequently write in the command line:

Code
$ find ./ -name '*.txt' | xargs grep regexp

and I recently concluded that it's a lot of typing, and that I could write a simple bash or perl script instead, that would invoke all those commands for me.
I wanted to have a tool that would be natural to use, especially for frequent cases. I decided to give the possibility to provide an easy 'or' option, because this logical operation is done differently in find, and differently in grep, and I don't want to check the manual all over again.
I also wanted the source to be easy -- to rather contain description of the grammar instead of explicit iteration. I achieved that easily in Scheme, because the language provides a natural structure pattern matcher. (The solution wasn't perfect, but I think it's tollerably elegant).
Although there is a structure pattern matcher for PERL available in CPAN (right here: http://search.cpan.org/~kstephens/Data-Match-0.06/Match.pm), I find it very idiosyncratic and expect that it might have too limited capabilities.

Anyway, since the task was so simple, I thought that there should be a straightforward way to accomplish it in PERL. And as for now, I'm a little surprised.


BillKSmith
Veteran

Jan 28, 2013, 8:45 AM

Post #7 of 11 (1598 views)
Re: [panicz] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

Sorry, I misunderstood the scope of your project. However, my point is still valid. If you use perl to parse your "clause", there is little reason to use shell commands to do the real work.

I recommend that you consider changing the syntax of your "clause" to conform to the conventions of Getopt::Long. Using the module to do the parsing is then nearly trivial.

Edit: Add sample command

Code
perl search.pl --from ~/ --for USER --in '*.php' '*.txt'

Good Luck,
Bill

(This post was edited by BillKSmith on Jan 28, 2013, 8:53 AM)


panicz
Novice

Jan 28, 2013, 10:20 AM

Post #8 of 11 (1594 views)
Re: [BillKSmith] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

Well, actually parsing the command line is the key theme here -- whether I invoke GNU find, or implement my own find in perl, is not that important, and it makes no difference except that the former is shorter and probably less portable.

Besides, of course I could use Getopt::Long as you suggest, but again -- that's not what I want. I don't want to have to precede my preopositions with '--', because I find it redundant and uncomfortable. I'm looking for the best solution to the problem that I described, not for a description of a similar problem that would be easier to solve (although I do appreciate your suggestion). Or, to put it more generally, I don't want to bend myself to the limitations of particular computer systems, but I prefer to bend the computer systems to my will.


(This post was edited by panicz on Jan 28, 2013, 10:26 AM)


7stud
Enthusiast

Jan 28, 2013, 5:24 PM

Post #9 of 11 (1582 views)
Re: [panicz] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

1) Using a grammar parser is a common way to solve problems like yours. I've been working on a solution using perl's Parse::RecDescent parser, but I find it very difficult to use, and the solution feels brittle. I've used python's PyParsing recursive descent parser, and it is much easier to use and the solutions feel more robust.

2) I think you have problems with your grammar because the bash shell drops the quotes around the strings that you want to specify as search terms, so any program will be handed a jumble of words for the search strings with no way to separate them. You can escape the quotes on the command line but that will make it very unwieldy to type the command.

In addition, I don't know if your Scheme program handles it or not, but the shell expands globs(file patterns), so if you use: '*.pl or *.txt' in your command, the shell is going to feed your program something similar to this:

prog1.pl prog2.pl or data1.txt data2.txt

Unfortunately, I'm finding that to be problematic because perl's Parse::RecDescent doesn't back up like a regex engine. So if you try to match 'or' followed by several words, the perl parser will gladly gobble up all the remaining words in the command string and terminate.

In any case, here is (an improved) start:


Code
use strict;  
use warnings;
use 5.012;

use Parse::RecDescent;


$::RD_ERRORS = 1; #Parser dies when it encounters an error
$::RD_WARN = 1; #Enable warnings-warn on unused rules, etc.
$::RD_HINT = 1; #Give out hints to help fix problems.
#$::RD_TRACE = 1; #Trace parsers' behaviour


our %RESULTS;

my $grammar = <<'END_OF_GRAMMAR';

#Start up action(executed in parser namespace):
{
use 5.012; #So I can use say()
use Data::Dumper;
}

#The array @item contains the rule name followed by
#the matches for that rule, e.g.:
# @item = ('clause', 'from', ['./some/dir', 'a/b
'])


startrule : clause(s)



clause: 'from' word(s /or/) #word(s) with the specified separator

{
#say Dumper(\@item);
$main::RESULTS{start_dir} = $item[-1];
}

| 'in' word(s /or/)
{
#say Dumper(\@item)
$main::RESULTS{filenames} = $item[-1];
}

| 'for' word(s /or/)
{
#say Dumper(\@item);
$main::RESULTS{search_terms} = $item[-1];
}


word : m{ \S* }xms

END_OF_GRAMMAR


my $text = q{from ./a in prog1.pl or data.txt for hello};

my $parser = Parse::RecDescent->new($grammar)
or die "Bad grammar!\n";

defined $parser->startrule($text)
or die "Can't match text";

use Data::Dumper;
say Dumper(\%RESULTS);


--output:--
$VAR1 = {
'start_dir' => [
'./a'
],
'search_terms' => [
'hello'
],
'filenames' => [
'prog1.pl',
'data.txt'
]
};

Then you can use File::Find to recursively search the start directory(if that is what you want to do) for the given filenames and search terms.

As you can see, your Scheme solution is much more elegant.


(This post was edited by 7stud on Jan 30, 2013, 4:58 PM)


BillKSmith
Veteran

Jan 28, 2013, 7:58 PM

Post #10 of 11 (1572 views)
Re: [panicz] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

You have already 'considered' my suggestion. I cannot ask for more. In lieu of that, 7Stud is probably on the right track. It would be ideal if you can find a parsing module which accepts the syntax specification in the meta-language you used in your second post. Time spent searching CPAN is seldom wasted even if you fail to find exactly what you want.
Good Luck,
Bill


7stud
Enthusiast

Jan 29, 2013, 9:22 AM

Post #11 of 11 (1553 views)
Re: [panicz] Wrapper around the UNIX find | xargs grep [In reply to] Can't Post

How does your Scheme program handle the two problems I mentioned in 1) above, i.e.:

a) Multiple search terms that each contain multiple words, e.g.:

for hello world goodbye mars

b) A list of filenames like:

in 1.txt 2.txt or 1.pl 2.pl

Note the 'or' in there. If you allow multiple words after an or, then what happens when you parse this:

in 1.txt or 2.txt for hello

Won't 2.txt and 'for' (as well as 'hello') all be parsed as filenames?


(This post was edited by 7stud on Jan 29, 2013, 9:29 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives