CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Effecient regex search of multiple files.

 



anglaissam
Novice

Jul 2, 2013, 9:34 AM

Post #1 of 8 (1018 views)
Effecient regex search of multiple files. Can't Post

Hi, i have a large list of 2000+ files that my program needs to search through. My current basic method is to simply use a for loop to open each file, load the contents into an array and then have the user defined regex search/extract.

What i am looking for is a more efficient solution to this. I know GREP might be a better/cleaner way to go about this however i don't know how much more efficient grep would be over a for loop.

What modules/codes in your opinion/experience will allow for the fastest search of these files?

Thanks for any help.


shawnhcorey
Enthusiast


Jul 2, 2013, 9:54 AM

Post #2 of 8 (1014 views)
Re: [anglaissam] Effecient regex search of multiple files. [In reply to] Can't Post

What do you want to do with the files? grep(1) would be faster than a Perl script since it is compiled and optimized.

__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


anglaissam
Novice

Jul 2, 2013, 10:00 AM

Post #3 of 8 (1012 views)
Re: [shawnhcorey] Effecient regex search of multiple files. [In reply to] Can't Post

My purpose is to locate the search term via regex, extract the entire line that contains the search term as well as the line number, (although i can get away with not obtaining line number if the script can be improved significantly without it), and then to print out the information into a txt file.


Laurent_R
Veteran / Moderator

Jul 2, 2013, 10:54 AM

Post #4 of 8 (1004 views)
Re: [anglaissam] Effecient regex search of multiple files. [In reply to] Can't Post

Loading the content into an array is not the most efficient solution. It is more efficient to read the file line by line, check the line for the regex and print the line is a match occurred.

But you are not telling us enough for us to give you help.


anglaissam
Novice

Jul 2, 2013, 11:00 AM

Post #5 of 8 (1003 views)
Re: [Laurent_R] Effecient regex search of multiple files. [In reply to] Can't Post


Code
foreach $f (@thefiles) { 
if ( ($f =~ m/\.htm$/i) || ($f =~ m/\.html$/i) ) {
chomp($f);
open (IN, "<$dir/$f") or die "Can't open $f: $!\n";

@lines = <IN>;

close IN;

for ( @lines ) {
$lcount++;
if ($_ =~ m/($search)/i) {
$found++;
print OUT "[$1] - $f: $lcount: $_\n";
}
}
}
$lcount=0;
}


Above is the code snippet for the loop i am using. For all i know, there is no way to make this code more efficient. I understand i could use UNIX "GREP" or Windows "Findstr" commands, but i am looking for a Perl solution.

This code will eventually be part of a much larger program which is why i am trying to improve the speed and efficiency of the code.


BillKSmith
Veteran

Jul 2, 2013, 11:02 AM

Post #6 of 8 (1002 views)
Re: [anglaissam] Effecient regex search of multiple files. [In reply to] Can't Post

You do so much I/O that it probably does not make any difference how you process the files.

You lose the line-numbers when you read an entire file into an array. Rather, read each line in a while loop. Print the line-number ($.) and the line ($_) if it matches your regexp.

If you know that that there is at most one match per file, exit the while loop with last when you find it.
Good Luck,
Bill


shawnhcorey
Enthusiast


Jul 2, 2013, 12:06 PM

Post #7 of 8 (994 views)
Re: [BillKSmith] Effecient regex search of multiple files. [In reply to] Can't Post

Try:

Code
grep -n 'pattern' files


See `man grep` for details.

__END__

I love Perl; it's the only language where you can bless your thingy.

Perl documentation is available at perldoc.perl.org. The list of standard modules and pragmatics is available in perlmodlib.

Get Markup Help. Please note the markup tag of "code".


Laurent_R
Veteran / Moderator

Jul 2, 2013, 3:21 PM

Post #8 of 8 (987 views)
Re: [anglaissam] Effecient regex search of multiple files. [In reply to] Can't Post


In Reply To
For all i know, there is no way to make this code more efficient.


Sure it can be made more efficient. More efficient in terms of coding complexity, and more efficient in terms or performance speed.

Loading a file into an array and then reading the array is less efficient that reading a file line by line.

But I already told you. If you don't want to believe me, do what you want and make your own tests (and don't worry, I do not feel offended).

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives