CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Regular Expressions:
global matching in big files



Aug 19, 2000, 9:11 AM

Post #1 of 3 (8517 views)
global matching in big files Can't Post

I have a some rather large text files consisting of a single line of text. The files are 40-120 Mb.

In order to improve performance, and stop my machine from hanging, i have been reading the files in 100000 character chunks.

I want to find all occurances of a pattern within the file.

Does m//g find all occurances dispite processing it one chunk at a time ?

Or do i need to split the file such that i don't lose any matches because of bad luck with the arbitrary chunk size ?



Aug 21, 2000, 3:31 AM

Post #2 of 3 (8517 views)
Re: global matching in big files [In reply to] Can't Post

Depending on what pattern you're trying to find (a few hundred chars or possibly 100.000+ chars), you might use overlapping 'chunks'.
- read in chars 0-99.999 and match
- keep the last 1000 (?) chars, append chars 100.000-199.999 to it and match again
- and so on

The size of the overlap would depend on your typical match, of course, so it's definitely not a good solution for ALL cases.
And you'll have to be careful of matching things that are in the overlapping part twice.
Anyway, it's just an idea - I'm not sure how this compares to gobbling up the whole file into memory, but it's definitely going to take more processing. Sleep()ing between loops or lowering the priority of that task might solve your problem too Smile


Aug 22, 2000, 8:41 AM

Post #3 of 3 (8517 views)
Re: global matching in big files [In reply to] Can't Post

Thank you, TheGame+

That sufficiently answers my question.

I appreciate your help very much,


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives