Home: Perl Programming Help: Regular Expressions:
Match characters in middle and end string



Stefanik
User

Jan 10, 2013, 1:54 AM


Views: 34311
Match characters in middle and end string

Hi,

I've a file as following:


Code
 anystring1 
anystring2SUB:anystring3:anystring4;
anystring5SUB:
:anystring6:
anystring7;



I have to perform two kind of matching:

1) check all the lines contains "SUB:" and as the end character ";". print them

2) check all the lines contains "SUB", from here remove all the "\n" at the end until I find out the line with ";" at the end.

The second point is to "normalize" the lines as the one at point 1.



Now, I start to write regexp for point 1:


Code
 if ($qpar =~ /^\w+SUB:\w+\;$/) {  

print $qpar;

}



Stefanik
User

Jan 10, 2013, 4:43 AM


Views: 34303
Re: [Stefanik] Match characters in middle and end string

I modify the regexp, and now works:

Quote
if ($qpar =~ (/^.*SUB:.*\;$/m)){print $qpar;}



What's the difference between "\w" and "." ?

Both of them represent any alphanumeric character?


(This post was edited by Stefanik on Jan 10, 2013, 5:05 AM)


Stefanik
User

Jan 10, 2013, 6:07 AM


Views: 34293
Re: [Stefanik] Match characters in middle and end string

I try to match the second point:

Code
($qpar =~ (/^.*SUB:.*[^;]$/m))


I find all the lines contain "SUB", but doesn't end with ";".
But the code seems to doesn't manages "^;" and print all the lines contain SUB also the one ending with ";".
Any suggests?


(This post was edited by Stefanik on Jan 10, 2013, 6:08 AM)


BillKSmith
Veteran

Jan 10, 2013, 6:18 AM


Views: 34291
Re: [Stefanik] Match characters in middle and end string

We usually think of /./ as meaning "match any character". /\w/ means match any word charcter (/[a-zA-Z_0-9]/).

In your example, this would not make a difference. Note that in your second case, you use .* rather than \w+. The "+" requires atleast one match. The "*" does not. That is the difference.
Good Luck,
Bill


Stefanik
User

Jan 10, 2013, 8:08 AM


Views: 34286
Re: [BillKSmith] Match characters in middle and end string

I also try "\w*" but I didn't get any printout again.
Anyway I solved with ".*"

Can you help me with:


Code
($qpar =~ (/^.*SUB:.*[^;]$/m))


Thanks again


rovf
Veteran

Jan 10, 2013, 11:07 PM


Views: 34212
Re: [Stefanik] Match characters in middle and end string

The pattern ^.* at the beginning of a regexp is redundant, so you basically match

SUB:.*[^;]$

Since you are using the m-modifier for your regexp, the $ changes its meaning from matching end of the string to matching end of the line. That is, your pattern matches, if $qpar contains the text SUB:, and somewhere later a \n which is not immediately preceeded by a semicolon. For instance, the following string would match:

"xxxxSUB:yyyy\n\nSUB:\nbbbbbb"

In this case, the matched substring would be

SUB:yyyy\n\nSUB:\n

If you would have used .*? instead of .*, the matched substring would be

SUB:yyyy\n

Does this answer your question?


Stefanik
User

Jan 11, 2013, 6:15 AM


Views: 34202
Re: [rovf] Match characters in middle and end string

Hi rovf, thanks you're right about my question.
I try to change the regexp in the way you suggest me:


Code
if ($qpar =~ (/SUB:.?[^;]$/m)){print $qpar;}


But no lines are printout.

The file I check in contains following lines:





Code
SET:TESTSUB:TRANSID,t1:NUM,428:PARAMETERS,other; 
GET:TESTSUB:TRANSID,t2:

SET:TESTSUB:TRANSID,t3:NUM,428:PARAMETERS,other;no
NUM,327
:PARAMETERS,other;

<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?><Request MO="OSUB" O
peration="get"> <num>456</num></Request>
<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?>
<Response>
<errorid>051</errorid>
</Response>



(This post was edited by Stefanik on Jan 11, 2013, 6:26 AM)


rovf
Veteran

Jan 11, 2013, 6:18 AM


Views: 34200
Re: [Stefanik] Match characters in middle and end string

You wrote .?, while I suggested .*?


Stefanik
User

Jan 11, 2013, 6:23 AM


Views: 34196
Re: [rovf] Match characters in middle and end string

Sorry..

Code
if ($qpar =~ (/SUB:.*?[^;]$/m)){print $qpar;}


In this way I match:

Code
SET:TESTSUB:TRANSID,t1:NUM,428:PARAMETERS,other;  
GET:TESTSUB:TRANSID,t2:
SET:TESTSUB:TRANSID,t3:NUM,428:PARAMETERS,other;no

while the first line shouldn't be printed out


rovf
Veteran

Jan 11, 2013, 6:32 AM


Views: 34191
Re: [Stefanik] Match characters in middle and end string

Write the print statement like this:


Code
print "FOUND: <$qpar>\n";



Stefanik
User

Jan 11, 2013, 12:32 PM


Views: 34174
Re: [rovf] Match characters in middle and end string

The output:


Code
FOUND: <SET:TESTSUB:TRANSID,t1:NUM,458:PARAMETERS,other; 
>
FOUND: <GET:TESTSUB:TRANSID,t2:
>
FOUND: <SET:TESTSUB:TRANSID,t3:NUM,458:PARAMETERS,other;NO
>



rovf
Veteran

Jan 12, 2013, 12:46 AM


Views: 34155
Re: [Stefanik] Match characters in middle and end string

I see, my way to modify the print statement was not wise (I wanted to verify that there is no white space before the semicolon), so maybe you better do:


Code
use Data::Dumper qw(Dumper);


and then


Code
print(Dumper($qpar),"\n");



Stefanik
User

Jan 12, 2013, 7:22 AM


Views: 34138
Re: [rovf] Match characters in middle and end string

The new code:

Code
if ($qpar =~ (/SUB:.*?[^;]$/m)){print(Dumper($qpar),"\n");}


here the new output:

Code
$VAR1 = 'SET:TESTSUB:TRANSID,t1:NUM,458:PARAMETERS,other; 
';

$VAR1 = 'GET:TESTSUB:TRANSID,t2:
';

$VAR1 = 'SET:TESTSUB:TRANSID,t3:NUM,458:PARAMETERS,other;NO
';



(This post was edited by Stefanik on Jan 12, 2013, 7:27 AM)


rovf
Veteran

Jan 12, 2013, 8:41 AM


Views: 34128
Re: [Stefanik] Match characters in middle and end string

In the input file, there *MUST* be a space after the semicolon in the first line, otherwise your regexp wouldn't have matched. Maybe you should hexdump your input?


Stefanik
User

Jan 13, 2013, 12:45 PM


Views: 34069
Re: [rovf] Match characters in middle and end string

I've just execute hexdump on the log file, but no space is present.

Frown


rovf
Veteran

Jan 14, 2013, 1:25 AM


Views: 34052
Re: [Stefanik] Match characters in middle and end string

Oops, I must have been absent-minded. Forget my silly argument with the space. This is not relevant.

Of course the reason is that your regexp requires that no semicolon comes just before the newline. But your t1 line ends in a semicolon, and hence doesn't match.


Stefanik
User

Jan 14, 2013, 5:44 AM


Views: 34042
Re: [rovf] Match characters in middle and end string

My intention is to select just the line contains "SUB:", but dosn't end with ";".

So, do you mean is there a problem with regexp? Is it wrong?


rovf
Veteran

Jan 14, 2013, 6:03 AM


Views: 34040
Re: [rovf] Match characters in middle and end string

If you want to select only lines containing SUB and not ending with a semicolon, you can equally well match against


Code
/(SUB.*[^;]$)/m


since the dot doesn't match a newline.


Stefanik
User

Jan 15, 2013, 5:27 AM


Views: 34007
Re: [rovf] Match characters in middle and end string

Thanks a lot for your support rovf Smile


Stefanik
User

Jan 16, 2013, 2:02 PM


Views: 33925
Re: [rovf] Match characters in middle and end string

I try the solution but it doesn't work.
Seems regex continues to match the "\n"

It works if I write:

Code
 
/SUB.*[^;]\n/m



(This post was edited by Stefanik on Jan 16, 2013, 2:16 PM)


rovf
Veteran

Jan 16, 2013, 11:37 PM


Views: 33899
Re: [Stefanik] Match characters in middle and end string

Of course it matches the newline. After all, you wrote the newline into the regexp.

Your regexp says: Line containing SUB and which has no semicolon in front of the newline.


Stefanik
User

Jan 17, 2013, 12:04 AM


Views: 33897
Re: [rovf] Match characters in middle and end string

Ok, but in this way regexp doesn't match last line if it has a "\n".

So, what I need is to match "....SUB....[^;]" , independent if there is a \n at the end. I don't know if it's possible in just one regexp or I should check it in two regexp (one with "\n" at the end, another without "\n").


FishMonger
Veteran / Moderator

Jan 17, 2013, 8:29 AM


Views: 33890
Re: [Stefanik] Match characters in middle and end string

/SUB.*[^;]\n?/m


(This post was edited by FishMonger on Jan 17, 2013, 8:29 AM)


Stefanik
User

Jan 17, 2013, 1:37 PM


Views: 33876
Re: [FishMonger] Match characters in middle and end string

doesn't work Frown

Input:

Code
meSUBstring1; 
youSUBstring2;
ISUBstring3
string4;
noSUBstring5;


code:

Code
#!/usr/bin/perl 

use strict;
use warnings FATAL => qw(all);
use diagnostics;

my $qnso="C:/Users/me/Desktop/Perl_Test/temp/test.log";
my $qpar="x";

open (NSOFILE, "<", $qnso) or die "No file!";
while ($qpar = <NSOFILE>){
if ($qpar =~ /SUB.*[^;]\n?/m){print $qpar;}
}
close (NSOFILE);


Output:

Code
meSUBstring1; 
youSUBstring2;
ISUBstring3
noSUBstring5;



(This post was edited by Stefanik on Jan 17, 2013, 1:38 PM)


Laurent_R
Veteran / Moderator

Jan 18, 2013, 3:44 PM


Views: 33805
Re: [Stefanik] Match characters in middle and end string

Just a quick try, syntax could be cleaner, but it works and might show you the way:


Code
#!/usr/bin/perl  

use strict;
use warnings FATAL => qw(all);
use diagnostics;

my $qnso="C:/Users/me/Desktop/Perl_Test/temp/test.log";

# open (NSOFILE, "<", $qnso) or die "No file!";
while (my $qpar = <DATA>){
chomp $qpar;
$qpar .= <DATA> and chomp $qpar while ($qpar !~ /;\s*$/);
if ($qpar =~ /SUB.*[^;]\n?/m){print $qpar, "\n";}
}
# close (NSOFILE);

__DATA__
meSUBstring1;
youSUBstring2;
ISUBstring3
string4;
noSUBstring5;


This prints out this:


Code
$ perl  qpar.pl 
meSUBstring1;
youSUBstring2;
ISUBstring3 string4;
noSUBstring5;



rovf
Veteran

Jan 19, 2013, 1:48 PM


Views: 16191
Re: [Stefanik] Match characters in middle and end string


Quote
So, what I need is to match "....SUB....[^;]"


This would mean matching already if *any* non-semicolon character comes somewhere after the SUB. Don't think that you really mean that.

You said, that it should match even if there is no newline "at the end". But what, then, is the end? Do you mean "the end of the string"?

I think the main problem here is that I still don't understand exactly under what conditions you want your string to be matched....


Stefanik
User

Jan 20, 2013, 6:41 AM


Views: 16171
Re: [rovf] Match characters in middle and end string

For Laurent:
yes is exactly what I need, but I don't understand few codes:

Why do you execute:

Code
chomp $qpar;

the you repeat chomp in next instruction code?


Code
$qpar .= <DATA> and chomp $qpar while ($qpar !~ /;\s*$/);

what is ".=" ?

for rovf:
sorry, maybe I was just confusing to explain.
I have a log file where all the relevant lines have word "SUB" at the beginning of line (but not the first characters) and ending with semicolon.
In some case this line is split on more lines, so I have this string start at line and end in the next line (where semicolon is).
I have to "normalize" this situation before to print them.
Last problem is that ";" could have a "\n", or not (if it's at the end of file, whitout any other line next).


rovf
Veteran

Jan 21, 2013, 1:50 AM


Views: 16124
Re: [Stefanik] Match characters in middle and end string

In this case, I would ignore the \n completely. Just find SUB, followed by any text, up the next semicolon. You just need to make sure that the dot matches the newline, otherwise your pattern will faile. I.e. you need something like


Code
/(SUB.+?;)/ms


(Note the 's' modifier to the regexp!)


Laurent_R
Veteran / Moderator

Jan 21, 2013, 4:49 AM


Views: 16122
Re: [rovf] Match characters in middle and end string

Hi,


Code
$qpar .= <DATA> and chomp $qpar while ($qpar !~ /;\s*$/);


This says: if $qpar does not end with a semi-colon (;) possibly followed by some spaces, then get the next line of input, concatenate it with the current $qpar, chomp the new $qpar (this is needed since a new line was added at the end of $qpar, you need to remove the new line characters again), and do all this as long as the new line you get is not ended by a semi-colon.

$c .= "foo" : this takes $c and concatenates "foo" at the end of $c.

This is equivalent to $c = $c . "foo";


Stefanik
User

Jan 21, 2013, 1:24 PM


Views: 16101
Re: [Laurent_R] Match characters in middle and end string

Hi,
thanks to all of you for your helps and explain.

rovf, just a question again... what is "s" at the end of regexp?

Ste


(This post was edited by Stefanik on Jan 21, 2013, 1:25 PM)


Laurent_R
Veteran / Moderator

Jan 22, 2013, 1:57 PM


Views: 16081
Re: [Stefanik] Match characters in middle and end string

Perl documentation on Regex modifiers:

- m :
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.

- s :
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.