CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
how to extract text between 2 strings on separate lines

 



ozosan
New User

Nov 13, 2013, 7:44 AM

Post #1 of 10 (989 views)
how to extract text between 2 strings on separate lines Can't Post

hello,
Im trying to write the script for my task which is to extract text between to strings on separate lines, while I have the command to run it from the command line I am not sure how to do it in order to get the results into the file, any ide how to do it ?
Heres is one liner that works fine :
[perl -ne "BEGIN { @ARGV = map glob, @ARGV }; print if /^start\b$/ .. /^end\b$/ " input/*]...so far I have this routine :
[use strict;
use warnings;
my $record = "";
opendir (DIR, "C:/Users/input/") or die "$!";
my @files = readdir DIR;
close DIR;
splice (@files,0,2);

open(MYOUTFILE, ">>output/output.txt");
foreach my $file (@files) {
open (CHECKBOOK, "binput/$file")|| die "$!";
while ($record = <CHECKBOOK>) {

if ($record=~ /^start\b$/ .. /^end\b$/) {
print MYOUTFILE "$file;$record\n";
}
}
}
close(CHECKBOOK);

}
close(MYOUTFILE);
]


FishMonger
Veteran / Moderator

Nov 13, 2013, 7:59 AM

Post #2 of 10 (985 views)
Re: [ozosan] how to extract text between 2 strings on separate lines [In reply to] Can't Post

You didn't say in what way the script is failing to do what you want, but I suspect that it's failng to open the file and the die statement is being executed.

Check your path. You use opendir on "C:/Users/input/" but when you attempt to open the file, you use "binput/$file" which is using a different path.


ozosan
New User

Nov 13, 2013, 8:03 AM

Post #3 of 10 (983 views)
Re: [FishMonger] how to extract text between 2 strings on separate lines [In reply to] Can't Post

The script is printing entire files, even those which don`t contain the start and end point :(


FishMonger
Veteran / Moderator

Nov 13, 2013, 8:10 AM

Post #4 of 10 (976 views)
Re: [ozosan] how to extract text between 2 strings on separate lines [In reply to] Can't Post

If that's true, then the script you're executing must be different from the code you posted, which makes sense since the code you posted won't compile.

Please do a copy past of your code rather than manually retyping it. And, use the code tags so that the indentation will be retained.


ozosan
New User

Nov 13, 2013, 8:20 AM

Post #5 of 10 (974 views)
Re: [FishMonger] how to extract text between 2 strings on separate lines [In reply to] Can't Post

Sorry for the confusion, here it is :

Code
use strict; 
use warnings;
my $record = "";
opendir (DIR, "C:/Users/input/") or die "$!";
my @files = readdir DIR;
close DIR;
splice (@files,0,2);

open(MYOUTFILE, ">>output/output.txt");
foreach my $file (@files) {
open (CHECKBOOK, "input/$file")|| die "$!";
while ($record = <CHECKBOOK>) {

if ($record=~ /^start\b$/ .. /^end\b$/) {
print MYOUTFILE "$file;$record\n";

}
}
close(CHECKBOOK);

}
close(MYOUTFILE);



(This post was edited by ozosan on Nov 13, 2013, 8:21 AM)


FishMonger
Veteran / Moderator

Nov 13, 2013, 9:09 AM

Post #6 of 10 (944 views)
Re: [ozosan] how to extract text between 2 strings on separate lines [In reply to] Can't Post

Where are you executing the script from; C:/Users/?

If you're in some other directory, then you still have a path problem in the script.

Using glob either directly or via the <> diamond operator would be cleaner than using opendir/readdir and splice.

Vars should be declared in the smallest scope that they require and as close as possible to where they are first used. I'm referring to your $record var. It should be declared in the loop initialization.

You should be using a lexical var for the filehandle and the 3 arg form of open. The die statement should include the filename.

Try this adjusted version. If it fails, then explain exactly how it fails and post a reasonable sample of one of your data files so I can run a test.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use Data::Dumper;

open my $out_fh, '>>', 'output/output.txt' or die "failed to open 'output/output.txt' <$!>";

my $path = 'C:/Users/input';

foreach my $file (<"$path/*">) {
open my $checkbook_fh, '<', $file or die "failed to open '$file' <$!>";
while ( my $record = <$checkbook_fh> ) {

if ( $record =~ /^start\b$/ .. /^end\b$/ ) {
print $out_fh "$file;$record";
}
}
close $checkbook_fh;

}
close $out_fh;



Laurent_R
Veteran / Moderator

Nov 13, 2013, 10:45 AM

Post #7 of 10 (935 views)
Re: [ozosan] how to extract text between 2 strings on separate lines [In reply to] Can't Post

I can't try right now, but I think you probably have an operator precedence problem on this line:


Code
     if ($record=~ /^start\b$/ .. /^end\b$/)  {


Try to add parens around the flip-flop expression:


Code
     if ($record=~ (/^start\b$/ .. /^end\b$/))  {



FishMonger
Veteran / Moderator

Nov 13, 2013, 11:07 AM

Post #8 of 10 (934 views)
Re: [Laurent_R] how to extract text between 2 strings on separate lines [In reply to] Can't Post

The problem is that $record is being bound to the first regex, but not the second. The second is bound to $_.

The solution is:

Code
if ($record =~ /^start\b$/ .. $record =~ /^end\b$/)  {



Laurent_R
Veteran / Moderator

Nov 13, 2013, 2:52 PM

Post #9 of 10 (930 views)
Re: [FishMonger] how to extract text between 2 strings on separate lines [In reply to] Can't Post


In Reply To
The problem is that $record is being bound to the first regex, but not the second. The second is bound to $_.

The solution is:

Code
if ($record =~ /^start\b$/ .. $record =~ /^end\b$/)  {



Even with the added parens I suggested? I don't see why, I am surprised, but, again, I can't test now, I might miss something.

An additional comment is that the \b anchors seem completely useless here when followed by an end of line anchor.

A final comment is that not naming the current record (using the default $_ variable) would probably make things much easier:

Code
if ( /^start$/ .. /^end$/)  { # ...

and avoid all these precedence problems in a very simple manner.


ozosan
New User

Nov 14, 2013, 1:34 AM

Post #10 of 10 (919 views)
Re: [Laurent_R] how to extract text between 2 strings on separate lines [In reply to] Can't Post


In Reply To
Many thanks for your valuable inputs and help. Thank you FishMonger for the adjusted script it works like a charm and thank you very much Laurent_R for the hint, indeed there was a precedence issue because =~ is of higher precedence than ... therefore $record =~ /^start\b$/ .. $record =~ /^end\b$/ fixed the issue. As for the "\b" in the regex yes its not needed so thank you again.


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives