Home: Perl Programming Help: Regular Expressions:
global matching in big files



Cameron
Deleted

Aug 19, 2000, 9:11 AM


Views: 6992
global matching in big files

I have a some rather large text files consisting of a single line of text. The files are 40-120 Mb.

In order to improve performance, and stop my machine from hanging, i have been reading the files in 100000 character chunks.

I want to find all occurances of a pattern within the file.

Does m//g find all occurances dispite processing it one chunk at a time ?

Or do i need to split the file such that i don't lose any matches because of bad luck with the arbitrary chunk size ?


Cameron


TheGame+
Deleted

Aug 21, 2000, 3:31 AM


Views: 6992
Re: global matching in big files

Depending on what pattern you're trying to find (a few hundred chars or possibly 100.000+ chars), you might use overlapping 'chunks'.
- read in chars 0-99.999 and match
- keep the last 1000 (?) chars, append chars 100.000-199.999 to it and match again
- and so on

The size of the overlap would depend on your typical match, of course, so it's definitely not a good solution for ALL cases.
And you'll have to be careful of matching things that are in the overlapping part twice.
Anyway, it's just an idea - I'm not sure how this compares to gobbling up the whole file into memory, but it's definitely going to take more processing. Sleep()ing between loops or lowering the priority of that task might solve your problem too Smile


Cameron
Deleted

Aug 22, 2000, 8:41 AM


Views: 6992
Re: global matching in big files

Thank you, TheGame+

That sufficiently answers my question.

I appreciate your help very much,
cameron