CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
PERL TO SPLIT BIGFILE;SAVE TEXT EXTRACTS TO SEPERATE FILES

 



MT_MANC
Novice

Jan 12, 2012, 7:46 AM

Post #1 of 5 (6124 views)
PERL TO SPLIT BIGFILE;SAVE TEXT EXTRACTS TO SEPERATE FILES Can't Post

Can anyone assist ?
I have a large .txt (115,000 lines) that contains many variable length sub-sections stacked on top of each other; I need to extract each subsection text between Common seperators & save extraction to a seperate text file (with *.LOG extension).
The extract file name is embedded in Line 3 of each subsection eg "gOILp_u"
Given file size Need to automate/loop/ this RegEx process with a Perl script/function.

For each subsection: ------
Start String: "WinSolve log file created "
End String: "Exit"
File Name to Save Extracted subsection text to: Occurs on Line 3 of each subsection
(eg1 1st subsection Line 3 reads: @ gRSH1p_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
(eg2 2nd subsection Line 3 reads: @ gOILp_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
So 1st subsection => saved to c:\data\ws3\sim\LOGS\gRSH1p_u.LOG
So 2nd subsection => saved to c:\data\ws3\sim\LOGS\gRSH1p_u.LOG


SECTION (TOP) OF LARGE FILE TO BE SPLIT:------------------

WinSolve log file created
@
@ gRSH1p_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
[TRUNCATED SUBSECTION TEXT]
@
Exit
@
WinSolve log file created
@
@ gOILp_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
[TRUNCATED SUBSECTION TEXT]
@
Exit
@
WinSolve log file created
@
@ gTRX1p_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
[TRUNCATED SUBSECTION TEXT]
@
Exit

etc
etc


SUMMARY OF REQUIREMENT --------------
1st subsection, extract all text between "WinSolve log file created " & "Exit" ; save to c:\data\ws3\sim\LOGS\gRSH1p_u.LOG
2nd subsection, extract all text between "WinSolve log file created " & "Exit" ; save to c:\data\ws3\sim\LOGS\gOILp_u.LOG
3rd subsection, extract all text between "WinSolve log file created " & "Exit" ; save to c:\data\ws3\sim\LOGS\gTRX1p_u.LOG
etc
etc


FishMonger
Veteran / Moderator

Jan 12, 2012, 8:07 AM

Post #2 of 5 (6122 views)
Re: [MT_MANC] PERL TO SPLIT BIGFILE;SAVE TEXT EXTRACTS TO SEPERATE FILES [In reply to] Can't Post

What have you tried?

What portion of the task is giving you trouble?

Should the lines that mark the beginning and end of each section be included in the new files, or just the lines between them?


MT_MANC
Novice

Jan 12, 2012, 8:21 AM

Post #3 of 5 (6120 views)
Re: [FishMonger] PERL TO SPLIT BIGFILE;SAVE TEXT EXTRACTS TO SEPERATE FILES [In reply to] Can't Post

Hi
Yes the subsection delimiters SHOULD be included in the extracted text.
I tried recycling another function to strip out text by my Perl knowledge is v limited:


Code
  $tmp= &get_multi_line_field($jobxtrabody," WinSolve log file created ","Exit"); 
&write_file (1,$zipchart,"@$tmp");


sub get_multi_line_field($$){
my ($jobxtrabody,$start,$end,$opt) = @_;
my @field;
open(MEFJOB, "<$jobxtrabody") || error("[get_single_line_field()] Cannot open MEF job extra body file : $jobxtrabody");
while( <MEFJOB> ) {
if ($_ =~ /^(\s)*$start/) {
$_=~ s/$start//;
push(@field,trim($_)."\n");
while( <MEFJOB> ) {
if($opt!=1){next if /^(\s)*$/}
last if /^(\s)*$end/;
push(@field,trim($_)."\n");
}
close(MEFJOB);
return \@field;
}
}
error("[get_multi_line_field()] Failed to read field $start");
close(MEFJOB);

}


But couldnt seem to get it to work.
Didnt get to next stage of writing fn to extract FileExtractSave name & then save each extracted text subsection as this filename.LOG (ASCII text file)


FishMonger
Veteran / Moderator

Jan 12, 2012, 8:55 AM

Post #4 of 5 (6117 views)
Re: [MT_MANC] PERL TO SPLIT BIGFILE;SAVE TEXT EXTRACTS TO SEPERATE FILES [In reply to] Can't Post

Don't use prototypes on your subs. They almost never needed or wanted and can cause problems.

In case you're not sure what I mean by prototype, it's the '($$)' portion of the subroutine definition.

The easiest way to extract the sections is to use the flip-flop (range) operator.

Here's a working example (with minimal error handling) using your sample data.

Code
#!/usr/bin/perl 

use strict;
use warnings;

my @section;

while( my $line = <DATA> ) {

if ( $line =~ /^WinSolve log file created/ .. $line =~ /^Exit/ ) {
push @section, $line;
if ( $line =~ /^Exit/ ) {
my $filename = "$1.LOG" if $section[2] =~ /^\@ (\S+)/;
open my $fh, '>', $filename or die "failed to create $filename <$!>";
print $fh @section;
close $fh or die "failed to save $filename <$!>";
undef @section;
}
}
}


__DATA__
WinSolve log file created
@
@ gRSH1p_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
[TRUNCATED SUBSECTION TEXT]
@
Exit
@
WinSolve log file created
@
@ gOILp_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
[TRUNCATED SUBSECTION TEXT]
@
Exit
@
WinSolve log file created
@
@ gTRX1p_u @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
[TRUNCATED SUBSECTION TEXT]
@
Exit



MT_MANC
Novice

Jan 18, 2012, 4:01 AM

Post #5 of 5 (5903 views)
Re: [FishMonger] PERL TO SPLIT BIGFILE;SAVE TEXT EXTRACTS TO SEPERATE FILES [In reply to] Can't Post

Fishmonger

Many thanks - this worked. The delays in responding to you last post were caused by the large input file (to be split up) not having a standardised layout in parts & thus the script stopping.

But the error-handling of your code was just enough to identify (the line number) where this non-standardised section was located in order to correct it, then re-run the script.

Your effort/skill has saved me considerable time/stress so thank you again !

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives