Oct 4, 2012, 4:51 PM
Post #5 of 7
Re: [Laurent_R] regex across \n
[In reply to]
If I can't open the text file in Notepad and parse it by hand, to me it's very large. ;) (I don't do this very often.)
I have some very large (50mb+) text files from which I need to clean up the data.
By the criteria of the files I am working with, these are very SMALL files. The files I am working on usually have sizes typically between 10 to 20 Gbytes, and sometimes up to 700 GB or even more.
You don't give enough information on you input file, but I would think that slurping the file after having defined the input separator as "=" or as "=\n" would probably help you very much.
My complete input file can be found here: http://vortex.plymouth.edu/~stjones00/Apr10.txt
The problem I have is there are incomplete entries mixed in with complete entries (plus other extraneous entries I don't want), so I need to parse out the wanted data. (It will begin with UA or UUA and end with =, but I need the leading line, hence my beginning the pattern with UB.) Then I need to take these individual, complete entries and perform some operations on them. (Test for a specific value, remove \n, etc.)
My thought was to match the pattern into $1 and send that to a subroutine to perform the operations.