CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Match string in multi-line file

 



Stefanik
User

Dec 10, 2012, 8:05 AM

Post #1 of 35 (4273 views)
Match string in multi-line file Can't Post

Hi,
I think this is not the first question of this kind, but I really don'e understand how and what use...

I have a file like this:

Code
FirstLine 
newline1:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3
newline2:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3
newline3:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3
....


I should printout in a file all the unique "value1".
So if value1 is different between newLine1 and newLine2, then I write it in a specific file, otherwise no.

Can you help me?


Laurent_R
Veteran / Moderator

Dec 10, 2012, 10:17 AM

Post #2 of 35 (4262 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Read each line, extract the value you are interested it and store it in a hash (as the key of the hash, the value can be anything, for example "1"). This will remove duplicates, because there cannot be duplicate keys in a hash. Then just print the keys of the hash.


Stefanik
User

Dec 11, 2012, 12:46 AM

Post #3 of 35 (4250 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Thanks Laurent.
Can you suggest me how to get the substring?
I mean, I was thinking to use substr function, but I should pass a specific offset and lenght. Is it right?
Or should I use some other function?

Thanks


Stefanik
User

Dec 11, 2012, 2:10 AM

Post #4 of 35 (4246 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

I try this code:


Code
 open (PARSEFILE, "<", $FileToParse); 
while ($FileTemp = <PARSEFILE>){
if ( $FileTemp =~ \.\s+\w+\s+/FieldCostant1,\s(T\d+)$/) {
print $FileTemp;
}
}
close (PARSEFILE);







I recieve following error:



Bareword found where operator expected at ./BatchParser.pl line 17, near "#if ( $FileTemp"
(Might be a runaway multi-line $$ string starting on line 16)
(Missing operator before FileTemp?)
syntax error at ./BatchParser.pl line 16, near "\."
syntax error at ./BatchParser.pl line 22, near "}"
Execution of ./BatchParser.pl aborted due to compilation errors.



rovf
Veteran

Dec 11, 2012, 2:56 AM

Post #5 of 35 (4241 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

You forgot the delimiters to your regexp, for instance


Code
$FileTemp =~ m[...] 



Laurent_R
Veteran / Moderator

Dec 11, 2012, 7:53 AM

Post #6 of 35 (4233 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post


In Reply To
Can you suggest me how to get the substring?
I mean, I was thinking to use substr function, but I should pass a specific offset and lenght. Is it right?


You can use the substr finction if the format is constant, i.e. if the value you want to capture is always starting at the same byte of your input line and has always the same length.

The syntax is:


Code
substr EXPR, OFFSET [ , LEN ]


So, in your example, that would be:


Code
my $value = substr $line, 24, 30; # (if I counted the characters right...)


If the line format is not constant, then you should rather use either the split function or a regular expression.

(Split or regexes can also be used if the line format is constant, but the substr function is in that case better, clearer and faster.)


Stefanik
User

Dec 12, 2012, 8:11 AM

Post #7 of 35 (4223 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Thank you for your answer. I'm going to try.

Anyway I can't use substr, because the length of the value I should keep is variable.

Maybe I could split on more raws the field and delete the part I don't take care on.

Ste


Laurent_R
Veteran / Moderator

Dec 12, 2012, 9:34 AM

Post #8 of 35 (4217 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

One possible way:


Code
my $line = "newline1:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3"; 
my ($field_1, $field_2) = split /:/, $line ; #$field_2 is now "FieldCostant1,value1"
my ($cst1, $value1) = split /,/ $field_2;


You clould also use directly a regular expression:


Code
my $value1 = $1 if $line =~ /^\w+:\w+,(\w+)/;



Stefanik
User

Dec 27, 2012, 10:24 AM

Post #9 of 35 (4058 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Thanks Laurent.

Hi try your first solution and it works fine, but if I have an input file of more lines, the script manages just the first two lines.


BillKSmith
Veteran

Dec 27, 2012, 11:29 AM

Post #10 of 35 (4040 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Did you forget Laurent's first reply? It tells you what to do with the value after you extract it. All the later posts deal only with how to extract the value.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Dec 27, 2012, 1:40 PM

Post #11 of 35 (4030 views)
Re: [BillKSmith] Match string in multi-line file [In reply to] Can't Post

Yes, read each line of your file, and, for each line, extract what you need, store it in a hash that will remove duplicates, and then print the hash.

As Bill said, all my more recent posts only dealt only with how to extract the data you need from a given line.


Stefanik
User

Dec 28, 2012, 12:18 AM

Post #12 of 35 (4014 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Ok it works :)

In my "test" script there was a wrong line...

Anyway, here the code as suggest from Laurent:


Code
     open (PARSEFILE, "<", $FileToParse) or die "No file!"; 
while ($FileTemp = <PARSEFILE>){
my ($field_1, $field_2) = split /:/, $FileTemp;
my ($cst1, $value1) = split /,/, $field_2 ;print "$value1\n";
}
close (PARSEFILE);



Many thanks.


(This post was edited by Stefanik on Dec 28, 2012, 12:22 AM)


Stefanik
User

Dec 28, 2012, 2:20 AM

Post #13 of 35 (4004 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Hi try something more to exercise.

If I have the string to extract not in the second field ($2), but in any part of the line, I'd use regular expression as indicate me:


Code
  $FileTemp =~ s/:/\n/g;  
my $value1 = $2 if $FileTemp =~ /^TRANSID,\w+/;
print $value1;



But no output is returned.

What I'm wrong?!? :(


(This post was edited by Stefanik on Dec 28, 2012, 2:40 AM)


rovf
Veteran

Dec 28, 2012, 2:50 AM

Post #14 of 35 (3999 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

May I ask you what the $2 is supposed to do?

You don't have any capturing groups in your regexp, so $2 will always be undef.


Stefanik
User

Dec 28, 2012, 3:37 AM

Post #15 of 35 (3995 views)
Re: [rovf] Match string in multi-line file [In reply to] Can't Post

Wrong copy/paste.

Here the code:




Code
 $FileTemp =~ s/:/\n/g; 
my $value1 = (/^STRING,\w+(\w+)/), $FileTemp;
print $value1;



rovf
Veteran

Dec 28, 2012, 4:04 AM

Post #16 of 35 (3992 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

???

If *this* is really the actual code, it should not even compile!


rovf
Veteran

Dec 28, 2012, 4:08 AM

Post #17 of 35 (3991 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Do you have


Code
use strict; 
use warnings FATAL => qw(all);


at the beginning of your code? If not, please add it and run your program again....


Stefanik
User

Dec 28, 2012, 5:04 AM

Post #18 of 35 (3987 views)
Re: [rovf] Match string in multi-line file [In reply to] Can't Post

Ah!

Ok I'm going to execute it.

Thanks again.

Ste


Stefanik
User

Dec 30, 2012, 7:27 AM

Post #19 of 35 (3919 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

I've added the strings at the begin, with following error:


Code
Global symbol "$FileToParse" requires explicit package name at BatchParser.pl line 8. 
Global symbol "$FileParsed" requires explicit package name at BatchParser.pl line 11.
Global symbol "$FileToParse" requires explicit package name at BatchParser.pl line 29.
....
Execution of C:\Users\ADS\Desktop\Perl_Test\BatchParser.pl aborted due to compil
ation errors.


So, I expilcit the input file in the same block command of my execution code (I don't have different function, just a serie of line code). In this way I get following errors:


Code
Global symbol "$FileTemp" requires explicit package name at BatchParser_2.pl line 7. 
Global symbol "$FileTemp" requires explicit package name at BatchParser_2.pl line 8.
syntax error at BatchParser_2.pl line 8, near "){"
Global symbol "$FileTemp" requires explicit package name at BatchParser_2.pl line 9.
Global symbol "$FileTemp" requires explicit package name at BatchParser_2.pl line 10.
syntax error at BatchParser_2.pl line 12, near "}"
Execution of BatchParser_2.pl aborted due to compilation errors.


And I don't understand the reason. Frown

Here the code:

Code
 
use strict;
use warnings FATAL => qw(all);

my $FileToParse="C:/Users/Me/Desktop/Perl_Test/temp/File.log";
open (PARSEFILE, "<", $FileToParse) or die "No file!!!";
$FileTemp="x"
while ($FileTemp = <PARSEFILE>){
$FileTemp =~ s/:/\n/g;
my $value1 = (/^STRING,\w+(\w+)/), $FileTemp;
print $value1;
}
close (PARSEFILE);



(This post was edited by Stefanik on Dec 30, 2012, 7:29 AM)


Stefanik
User

Dec 30, 2012, 7:49 AM

Post #20 of 35 (3915 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

What I'm reading, if I'm not wrong, is a problem of local/global variable. But I don't understand the reason, because I don't have any other variable with same name...

Anyway I add "main::"

Code
use strict;  
use warnings FATAL => qw(all);
use diagnostics;

my $FileToParse="C:/Users/me/Desktop/Perl_Test/temp/File.log";
open (PARSEFILE, "<", $FileToParse) or die "No file!!!";
#$FileTemp="x"
while ($main::FileTemp = <PARSEFILE>){
$main::FileTemp =~ s/:/\n/g;
#my $value1 = (/^STRING,\w+(\w+)/), $main::FileTemp;
#print $value1;
print $main::FileTemp;
}
close (PARSEFILE);


Now I get this error:

Code
Uncaught exception from user code: 
Useless use of a variable in void context at BatchParser_2.pl line 11



Laurent_R
Veteran / Moderator

Dec 30, 2012, 10:42 AM

Post #21 of 35 (3899 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

You have two errors on this line:


Code
$FileTemp="x"


First, you should use "my" to declare $FileTemp as a lexical (local) variable. Second, you forgot the closing semi-colon at the end. So it should be:


Code
my $FileTemp="x" ;


Once you have this correct, you no longer need the awkward "$main::" to prefix $FileTemp of your last attempt.

(Note: my comments are not really on your last post, but rather on the previous one. In the last post, you commented out important lines so that you are even further away from an actual solution.)


(This post was edited by Laurent_R on Dec 30, 2012, 10:45 AM)


Stefanik
User

Dec 30, 2012, 1:12 PM

Post #22 of 35 (3885 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Hi Lauren,
I try what you suggest:

Code
 
use strict;
use warnings FATAL => qw(all);
#use diagnostics;

my $FileToParse="C:/Users/me/Desktop/Perl_Test/temp/File.log";
open (PARSEFILE, "<", $FileToParse) or die "No file!!!";
my $FileTemp="x";
while ($FileTemp = <PARSEFILE>){
$FileTemp =~ s/:/\n/g;
my $value1 = (/^TRANSID,\w+(\w+)/), $FileTemp;
print $value1;
}
close (PARSEFILE);


but I have the same error:


Code
Useless use of private variable in void context at BatchParser_2.pl line 11



Unsure


Laurent_R
Veteran / Moderator

Dec 30, 2012, 2:40 PM

Post #23 of 35 (3879 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Yes, you have another error in the script. This line:


Code
my $value1 = (/^TRANSID,\w+(\w+)/), $FileTemp;


is wrong and does not really make sense to me. What are you trying to do with it?

May be what you are trying to do is to capture part of the filename, in which case you could do it as follows:


Code
my $value1 = $1 if $FileTemp =~  /^TRANSID,\w+(\w+)/;


but I am a bit surprised that a filename should have a comma in it, this is quite uncommon as many operating systems don't like this too much.

Or maybe I completely misunderstood what you are trying to do in that line.


Stefanik
User

Dec 31, 2012, 6:53 AM

Post #24 of 35 (3818 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

I'm try to extract the value after field "TRANSID,".
I separate all the line after ":", in tris way I have a line with "TRANSID," and its value. These lines are in variable $FileTemp.
Then I'd like to extract just the value in these lines and store them in variable $value1.

TRANSID si what in my first post is called FieldCostant2.
I try to extract the value in case TRANSID is not in a fix position (for example in one of FieldCostantx).
I hope to explain my point.

Thanks.


(This post was edited by Stefanik on Dec 31, 2012, 7:09 AM)


Stefanik
User

Dec 31, 2012, 7:37 AM

Post #25 of 35 (3811 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Here an example of what contents in $FileTemp:


Code
use strict; 
use warnings FATAL => qw(all);
#use diagnostics;

my $FileToParse="C:/Users/me/Desktop/Perl_Test/temp/File.log";
open (PARSEFILE, "<", $FileToParse) or die "No file!!!";
my $FileTemp="x";
my $value1="x";
while ($FileTemp = <PARSEFILE>){
$FileTemp =~ s/:/\n/g;
print $FileTemp;
#print my $value1 = $2 if $FileTemp =~ /^(TRANSID),(\w+)/;
#print $value1;
}
close (PARSEFILE);


Output:

Code
LOG 
T1,ValueT
TRANSID,valueNeeded
OTHER,parameters;


Anyway the regular expression doesn't match "TRANSID,valueNeeded"


Laurent_R
Veteran / Moderator

Dec 31, 2012, 7:59 AM

Post #26 of 35 (1815 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

It should match. If it doesn't match, then you probably don't have what you think in your variable and you should try to print it.

Test of the regex under the debugger:



Code
  DB<1> $c = "TRANSID,valueNeeded" 

DB<2> print $2 if $c =~ /^(TRANSID),(\w+)/;
valueNeeded
DB<3> print "$1\n$2" if $c =~ /^(TRANSID),(\w+)/;
TRANSID
valueNeeded


As you can see, the regex does match the expression.


Stefanik
User

Jan 3, 2013, 1:32 PM

Post #27 of 35 (1779 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Thanks Laurent, I'll check.


Chris Charley
User

Jan 3, 2013, 3:43 PM

Post #28 of 35 (1771 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

Try changing this line

my $value1 = $2 if $FileTemp =~ /^(TRANSID),(\w+)/;

to

my $value1;
$value1 = $2 if $FileTemp =~ /^(TRANSID),(\w+)/m;



(This post was edited by Chris Charley on Jan 3, 2013, 3:45 PM)


BillKSmith
Veteran

Jan 3, 2013, 8:09 PM

Post #29 of 35 (1762 views)
Re: [Chris Charley] Match string in multi-line file [In reply to] Can't Post

Chris is certainly correct, but some explanation is needed. Editing multi-line files is a bit tricky. Finding perl's documentation of this flag is challenging and the result disappointing ("m Treat string as multiple lines." Refer: perldoc perlop). The book "Perl Best Practices" has a detailed explanation of this issue. The meat of it is that /m changes the meaning of the meta-characters ^ and $ to refer to the start/end of a line rather than the start/end of the string. (The book recommends always using the flags /xms)
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Jan 3, 2013, 11:50 PM

Post #30 of 35 (1751 views)
Re: [Chris Charley] Match string in multi-line file [In reply to] Can't Post

Hmmm, granted, the /m flag will not do any harm here, but I do not see how it will be useful here. So far, except perhaps for a post title which might be misunderstood, there is nothing in this post to tell us that we have a multiline string at hand.


BillKSmith
Veteran

Jan 4, 2013, 5:17 AM

Post #31 of 35 (1746 views)
Re: [Laurent_R] Match string in multi-line file [In reply to] Can't Post

Note the code in post 25.

Code
while ($FileTemp = <PARSEFILE>){  
$FileTemp =~ s/:/\n/g;

The string is made into a multi-line string by changing the colons to newlines. We know that there are multiple colons from the file format in post 1.

UPDATE: The following code and output was added to this post to demonstrate the solution:


Code
use strict; 
use warnings;
*PARSEFILE = *DATA;
while (my $FileTemp = <PARSEFILE>){
$FileTemp =~ s/:/\n/g;
my $value1 = $2 if $FileTemp =~ /^(TRANSID),(\w+)/m;
print "The value of $1 is '$value1'\n" if defined $value1;
}
__DATA__
newline1:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3
newline4:FieldCostant1,value1:TRANSID,valueNeeded:FieldCostant3,value3
newline2:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3
newline3:FieldCostant1,value1:FieldCostant2,value2:FieldCostant3,value3

Output:

Code
The value of TRANSID is 'valueNeeded'

Good Luck,
Bill

(This post was edited by BillKSmith on Jan 4, 2013, 6:07 AM)


Laurent_R
Veteran / Moderator

Jan 4, 2013, 9:54 AM

Post #32 of 35 (1735 views)
Re: [BillKSmith] Match string in multi-line file [In reply to] Can't Post


In Reply To
Note the code in post 25.

Code
while ($FileTemp = <PARSEFILE>){  
$FileTemp =~ s/:/\n/g;

The string is made into a multi-line string by changing the colons to newlines. We know that there are multiple colons from the file format in post 1.


Right, I forgot that.


Stefanik
User

Jan 4, 2013, 1:08 PM

Post #33 of 35 (1723 views)
Re: [Chris Charley] Match string in multi-line file [In reply to] Can't Post

Chris what you suggest me works fine.

But I'm not sure about the problem of "/string/m" , Billk talks about.


Laurent_R
Veteran / Moderator

Jan 4, 2013, 4:10 PM

Post #34 of 35 (1717 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

If it works fine for ou, forget about the rest of the discussion.

Wink


BillKSmith
Veteran

Jan 4, 2013, 5:38 PM

Post #35 of 35 (1712 views)
Re: [Stefanik] Match string in multi-line file [In reply to] Can't Post

In post 29, I tried to explain why Chris's fix works. I consider perl's documentation of this issue very poor. I guess that I did not do any better, Sorry about that.

The update in my later post is the key part of your code with Chris's fix added. It is a complete program that anyone can run to prove that the fix works. It does not introduce any new ideas.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives