CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Match characters in middle and end string

 



Stefanik
User

Jan 10, 2013, 1:54 AM

Post #1 of 31 (15494 views)
Match characters in middle and end string Can't Post

Hi,

I've a file as following:


Code
 anystring1 
anystring2SUB:anystring3:anystring4;
anystring5SUB:
:anystring6:
anystring7;



I have to perform two kind of matching:

1) check all the lines contains "SUB:" and as the end character ";". print them

2) check all the lines contains "SUB", from here remove all the "\n" at the end until I find out the line with ";" at the end.

The second point is to "normalize" the lines as the one at point 1.



Now, I start to write regexp for point 1:


Code
 if ($qpar =~ /^\w+SUB:\w+\;$/) {  

print $qpar;

}



Stefanik
User

Jan 10, 2013, 4:43 AM

Post #2 of 31 (15486 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

I modify the regexp, and now works:

Quote
if ($qpar =~ (/^.*SUB:.*\;$/m)){print $qpar;}



What's the difference between "\w" and "." ?

Both of them represent any alphanumeric character?


(This post was edited by Stefanik on Jan 10, 2013, 5:05 AM)


Stefanik
User

Jan 10, 2013, 6:07 AM

Post #3 of 31 (15476 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

I try to match the second point:

Code
($qpar =~ (/^.*SUB:.*[^;]$/m))


I find all the lines contain "SUB", but doesn't end with ";".
But the code seems to doesn't manages "^;" and print all the lines contain SUB also the one ending with ";".
Any suggests?


(This post was edited by Stefanik on Jan 10, 2013, 6:08 AM)


BillKSmith
Veteran

Jan 10, 2013, 6:18 AM

Post #4 of 31 (15474 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

We usually think of /./ as meaning "match any character". /\w/ means match any word charcter (/[a-zA-Z_0-9]/).

In your example, this would not make a difference. Note that in your second case, you use .* rather than \w+. The "+" requires atleast one match. The "*" does not. That is the difference.
Good Luck,
Bill


Stefanik
User

Jan 10, 2013, 8:08 AM

Post #5 of 31 (15469 views)
Re: [BillKSmith] Match characters in middle and end string [In reply to] Can't Post

I also try "\w*" but I didn't get any printout again.
Anyway I solved with ".*"

Can you help me with:


Code
($qpar =~ (/^.*SUB:.*[^;]$/m))


Thanks again


rovf
Veteran

Jan 10, 2013, 11:07 PM

Post #6 of 31 (15395 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

The pattern ^.* at the beginning of a regexp is redundant, so you basically match

SUB:.*[^;]$

Since you are using the m-modifier for your regexp, the $ changes its meaning from matching end of the string to matching end of the line. That is, your pattern matches, if $qpar contains the text SUB:, and somewhere later a \n which is not immediately preceeded by a semicolon. For instance, the following string would match:

"xxxxSUB:yyyy\n\nSUB:\nbbbbbb"

In this case, the matched substring would be

SUB:yyyy\n\nSUB:\n

If you would have used .*? instead of .*, the matched substring would be

SUB:yyyy\n

Does this answer your question?


Stefanik
User

Jan 11, 2013, 6:15 AM

Post #7 of 31 (15385 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

Hi rovf, thanks you're right about my question.
I try to change the regexp in the way you suggest me:


Code
if ($qpar =~ (/SUB:.?[^;]$/m)){print $qpar;}


But no lines are printout.

The file I check in contains following lines:





Code
SET:TESTSUB:TRANSID,t1:NUM,428:PARAMETERS,other; 
GET:TESTSUB:TRANSID,t2:

SET:TESTSUB:TRANSID,t3:NUM,428:PARAMETERS,other;no
NUM,327
:PARAMETERS,other;

<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?><Request MO="OSUB" O
peration="get"> <num>456</num></Request>
<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?>
<Response>
<errorid>051</errorid>
</Response>



(This post was edited by Stefanik on Jan 11, 2013, 6:26 AM)


rovf
Veteran

Jan 11, 2013, 6:18 AM

Post #8 of 31 (15383 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

You wrote .?, while I suggested .*?


Stefanik
User

Jan 11, 2013, 6:23 AM

Post #9 of 31 (15379 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

Sorry..

Code
if ($qpar =~ (/SUB:.*?[^;]$/m)){print $qpar;}


In this way I match:

Code
SET:TESTSUB:TRANSID,t1:NUM,428:PARAMETERS,other;  
GET:TESTSUB:TRANSID,t2:
SET:TESTSUB:TRANSID,t3:NUM,428:PARAMETERS,other;no

while the first line shouldn't be printed out


rovf
Veteran

Jan 11, 2013, 6:32 AM

Post #10 of 31 (15374 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

Write the print statement like this:


Code
print "FOUND: <$qpar>\n";



Stefanik
User

Jan 11, 2013, 12:32 PM

Post #11 of 31 (15357 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

The output:


Code
FOUND: <SET:TESTSUB:TRANSID,t1:NUM,458:PARAMETERS,other; 
>
FOUND: <GET:TESTSUB:TRANSID,t2:
>
FOUND: <SET:TESTSUB:TRANSID,t3:NUM,458:PARAMETERS,other;NO
>



rovf
Veteran

Jan 12, 2013, 12:46 AM

Post #12 of 31 (15338 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

I see, my way to modify the print statement was not wise (I wanted to verify that there is no white space before the semicolon), so maybe you better do:


Code
use Data::Dumper qw(Dumper);


and then


Code
print(Dumper($qpar),"\n");



Stefanik
User

Jan 12, 2013, 7:22 AM

Post #13 of 31 (15321 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

The new code:

Code
if ($qpar =~ (/SUB:.*?[^;]$/m)){print(Dumper($qpar),"\n");}


here the new output:

Code
$VAR1 = 'SET:TESTSUB:TRANSID,t1:NUM,458:PARAMETERS,other; 
';

$VAR1 = 'GET:TESTSUB:TRANSID,t2:
';

$VAR1 = 'SET:TESTSUB:TRANSID,t3:NUM,458:PARAMETERS,other;NO
';



(This post was edited by Stefanik on Jan 12, 2013, 7:27 AM)


rovf
Veteran

Jan 12, 2013, 8:41 AM

Post #14 of 31 (15311 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

In the input file, there *MUST* be a space after the semicolon in the first line, otherwise your regexp wouldn't have matched. Maybe you should hexdump your input?


Stefanik
User

Jan 13, 2013, 12:45 PM

Post #15 of 31 (15252 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

I've just execute hexdump on the log file, but no space is present.

Frown


rovf
Veteran

Jan 14, 2013, 1:25 AM

Post #16 of 31 (15235 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

Oops, I must have been absent-minded. Forget my silly argument with the space. This is not relevant.

Of course the reason is that your regexp requires that no semicolon comes just before the newline. But your t1 line ends in a semicolon, and hence doesn't match.


Stefanik
User

Jan 14, 2013, 5:44 AM

Post #17 of 31 (15225 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

My intention is to select just the line contains "SUB:", but dosn't end with ";".

So, do you mean is there a problem with regexp? Is it wrong?


rovf
Veteran

Jan 14, 2013, 6:03 AM

Post #18 of 31 (15223 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

If you want to select only lines containing SUB and not ending with a semicolon, you can equally well match against


Code
/(SUB.*[^;]$)/m


since the dot doesn't match a newline.


Stefanik
User

Jan 15, 2013, 5:27 AM

Post #19 of 31 (15190 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

Thanks a lot for your support rovf Smile


Stefanik
User

Jan 16, 2013, 2:02 PM

Post #20 of 31 (15108 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

I try the solution but it doesn't work.
Seems regex continues to match the "\n"

It works if I write:

Code
 
/SUB.*[^;]\n/m



(This post was edited by Stefanik on Jan 16, 2013, 2:16 PM)


rovf
Veteran

Jan 16, 2013, 11:37 PM

Post #21 of 31 (15082 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

Of course it matches the newline. After all, you wrote the newline into the regexp.

Your regexp says: Line containing SUB and which has no semicolon in front of the newline.


Stefanik
User

Jan 17, 2013, 12:04 AM

Post #22 of 31 (15080 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

Ok, but in this way regexp doesn't match last line if it has a "\n".

So, what I need is to match "....SUB....[^;]" , independent if there is a \n at the end. I don't know if it's possible in just one regexp or I should check it in two regexp (one with "\n" at the end, another without "\n").


FishMonger
Veteran / Moderator

Jan 17, 2013, 8:29 AM

Post #23 of 31 (15073 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

/SUB.*[^;]\n?/m


(This post was edited by FishMonger on Jan 17, 2013, 8:29 AM)


Stefanik
User

Jan 17, 2013, 1:37 PM

Post #24 of 31 (15059 views)
Re: [FishMonger] Match characters in middle and end string [In reply to] Can't Post

doesn't work Frown

Input:

Code
meSUBstring1; 
youSUBstring2;
ISUBstring3
string4;
noSUBstring5;


code:

Code
#!/usr/bin/perl 

use strict;
use warnings FATAL => qw(all);
use diagnostics;

my $qnso="C:/Users/me/Desktop/Perl_Test/temp/test.log";
my $qpar="x";

open (NSOFILE, "<", $qnso) or die "No file!";
while ($qpar = <NSOFILE>){
if ($qpar =~ /SUB.*[^;]\n?/m){print $qpar;}
}
close (NSOFILE);


Output:

Code
meSUBstring1; 
youSUBstring2;
ISUBstring3
noSUBstring5;



(This post was edited by Stefanik on Jan 17, 2013, 1:38 PM)


Laurent_R
Veteran / Moderator

Jan 18, 2013, 3:44 PM

Post #25 of 31 (14988 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

Just a quick try, syntax could be cleaner, but it works and might show you the way:


Code
#!/usr/bin/perl  

use strict;
use warnings FATAL => qw(all);
use diagnostics;

my $qnso="C:/Users/me/Desktop/Perl_Test/temp/test.log";

# open (NSOFILE, "<", $qnso) or die "No file!";
while (my $qpar = <DATA>){
chomp $qpar;
$qpar .= <DATA> and chomp $qpar while ($qpar !~ /;\s*$/);
if ($qpar =~ /SUB.*[^;]\n?/m){print $qpar, "\n";}
}
# close (NSOFILE);

__DATA__
meSUBstring1;
youSUBstring2;
ISUBstring3
string4;
noSUBstring5;


This prints out this:


Code
$ perl  qpar.pl 
meSUBstring1;
youSUBstring2;
ISUBstring3 string4;
noSUBstring5;



rovf
Veteran

Jan 19, 2013, 1:48 PM

Post #26 of 31 (9379 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post


Quote
So, what I need is to match "....SUB....[^;]"


This would mean matching already if *any* non-semicolon character comes somewhere after the SUB. Don't think that you really mean that.

You said, that it should match even if there is no newline "at the end". But what, then, is the end? Do you mean "the end of the string"?

I think the main problem here is that I still don't understand exactly under what conditions you want your string to be matched....


Stefanik
User

Jan 20, 2013, 6:41 AM

Post #27 of 31 (9359 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

For Laurent:
yes is exactly what I need, but I don't understand few codes:

Why do you execute:

Code
chomp $qpar;

the you repeat chomp in next instruction code?


Code
$qpar .= <DATA> and chomp $qpar while ($qpar !~ /;\s*$/);

what is ".=" ?

for rovf:
sorry, maybe I was just confusing to explain.
I have a log file where all the relevant lines have word "SUB" at the beginning of line (but not the first characters) and ending with semicolon.
In some case this line is split on more lines, so I have this string start at line and end in the next line (where semicolon is).
I have to "normalize" this situation before to print them.
Last problem is that ";" could have a "\n", or not (if it's at the end of file, whitout any other line next).


rovf
Veteran

Jan 21, 2013, 1:50 AM

Post #28 of 31 (9312 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

In this case, I would ignore the \n completely. Just find SUB, followed by any text, up the next semicolon. You just need to make sure that the dot matches the newline, otherwise your pattern will faile. I.e. you need something like


Code
/(SUB.+?;)/ms


(Note the 's' modifier to the regexp!)


Laurent_R
Veteran / Moderator

Jan 21, 2013, 4:49 AM

Post #29 of 31 (9310 views)
Re: [rovf] Match characters in middle and end string [In reply to] Can't Post

Hi,


Code
$qpar .= <DATA> and chomp $qpar while ($qpar !~ /;\s*$/);


This says: if $qpar does not end with a semi-colon (;) possibly followed by some spaces, then get the next line of input, concatenate it with the current $qpar, chomp the new $qpar (this is needed since a new line was added at the end of $qpar, you need to remove the new line characters again), and do all this as long as the new line you get is not ended by a semi-colon.

$c .= "foo" : this takes $c and concatenates "foo" at the end of $c.

This is equivalent to $c = $c . "foo";


Stefanik
User

Jan 21, 2013, 1:24 PM

Post #30 of 31 (9289 views)
Re: [Laurent_R] Match characters in middle and end string [In reply to] Can't Post

Hi,
thanks to all of you for your helps and explain.

rovf, just a question again... what is "s" at the end of regexp?

Ste


(This post was edited by Stefanik on Jan 21, 2013, 1:25 PM)


Laurent_R
Veteran / Moderator

Jan 22, 2013, 1:57 PM

Post #31 of 31 (9269 views)
Re: [Stefanik] Match characters in middle and end string [In reply to] Can't Post

Perl documentation on Regex modifiers:

- m :
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.

- s :
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives