CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
regex help

 



robertico
Novice

Mar 22, 2013, 12:41 AM

Post #1 of 15 (748 views)
regex help Can't Post

I use this regex to extract all lines starting with ' { ' and ending with ' } '


Code
push (@matches,$&) while($line =~ /{(.*?)}/g );


Now I'd like to exclude lines that contains a ' * '
I've already tried several options (also used the Perl Regular Expression Quick Reference Card)
I assume I need to use ' ^ but can't get it done.
Please help


(This post was edited by robertico on Mar 22, 2013, 12:41 AM)


rovf
Veteran

Mar 22, 2013, 2:16 AM

Post #2 of 15 (744 views)
Re: [robertico] regex help [In reply to] Can't Post

First, curly braces have special meaning in a Perl regexp; you need to escape them with \

Second, you didn't anchor your regexp, so the pattern is searched anywhere in the line. With ^ and $ you can denote the start and end of the line.


robertico
Novice

Mar 22, 2013, 2:46 AM

Post #3 of 15 (743 views)
Re: [rovf] regex help [In reply to] Can't Post


Quote
First, curly braces have special meaning in a Perl regexp; you need to escape them with \


Ok thank you. It works as expected so I didn't noticed.


Quote
Second, you didn't anchor your regexp, so the pattern is searched anywhere in the line. With ^ and $ you can denote the start and end of the line.


That's exactly what I want.

In one line I need to extract multiple sub strings between ' { ' and ' } '.
But some of them contain the character ' * ' somewehere between the curly braces. I need to exclude these sub strings.


rovf
Veteran

Mar 22, 2013, 4:20 AM

Post #4 of 15 (741 views)
Re: [robertico] regex help [In reply to] Can't Post

Sorry, but I don't understand your description. It's confusing to me. Which part of each line exactly do you want to extract?


robertico
Novice

Mar 22, 2013, 5:36 AM

Post #5 of 15 (740 views)
Re: [rovf] regex help [In reply to] Can't Post

I'll give you an example;


Code
some useless text {"to_id": 0, "message": "This is a sample", "message_id": 1000, "from_id": 999} more useless text {"to_id": 1, "message": "This is a sample", "message_id": 1000, "from_id": 999} and more 
some useless text {"to_id": 3, "message": "*This is a sample", "message_id": 1000, "from_id": 999} more useless text {"to_id": 4, "message": "This is a sample", "message_id": 1000, "from_id": 999} and more


For line one I need both parts between the curly braces;
Expected result;


Code
{"to_id": 0, "message": "This is a sample", "message_id": 1000, "from_id": 999}  
{"to_id": 1, "message": "This is a sample", "message_id": 1000, "from_id": 999}


For line two I only need one part (Not the part with the ' * ')

Expected result;


Code
{"to_id": 4, "message": "This is a sample", "message_id": 1000, "from_id": 999}



(This post was edited by robertico on Mar 22, 2013, 5:50 AM)


rovf
Veteran

Mar 22, 2013, 5:55 AM

Post #6 of 15 (734 views)
Re: [robertico] regex help [In reply to] Can't Post

Well, this makes it clearer. From your examples, I see that you do NOT need to extract lines starting and ending with {} (since none of the example lines have a curly brace in the beginning).

I would do the following approach:

First, collect all the {...} substrings, without worrying that they might contain a '*'. Then drop those which have an asterisk.

IMO this is easiest if you also drop the idea to use a statement modifier. Use a normal while loop, and then a if inside to choose the desired lines.


robertico
Novice

Mar 22, 2013, 6:08 AM

Post #7 of 15 (733 views)
Re: [rovf] regex help [In reply to] Can't Post

I'm lost now.

I'm reading a text file line by line.
If a line matches a pattern, I use the regex to extract the sub string between the curly brace (sometimes more than one per line)
But if there's an asterisk between the curly brace, I need to ignore that one.
When it matches the regex I need to do some further processing (JSON and write results to another file)

So I can't simply ignore the one with an asterisk.

Do I need to use a second regex before further processing or is it possible to create a suitable regex to do both at the same time ?

Sorry, it seems a little bit confusing.


BillKSmith
Veteran

Mar 22, 2013, 6:33 AM

Post #8 of 15 (725 views)
Re: [robertico] regex help [In reply to] Can't Post

Here is my version of the rovf solution. Use a module to match parens and grep to filter unwanted results.


Code
use strict; 
use warnings;
use Regexp::Common qw /balanced/;
my $BALANCED = qr/$RE{balanced}{-parens=>'{}'}/;
while (my $line = <DATA>) {
(my @first_pass) = $line =~ /($BALANCED)/g;
my @required = grep {!/[*]/} @first_pass;
print join( "\n", @required), "\n\n";
}
__DATA__
some useless text {"to_id": 0, "message": "This is a sample", "message_id": 1000, "from_id": 999} more useless text {"to_id": 1, "message": "This is a sample", "message_id": 1000, "from_id": 999} and more
some useless text {"to_id": 3, "message": "*This is a sample", "message_id": 1000, "from_id": 999} more useless text {"to_id": 4, "message": "This is a sample", "message_id": 1000, "from_id": 999} and more

Good Luck,
Bill


Kenosis
User

Mar 22, 2013, 6:41 AM

Post #9 of 15 (723 views)
Re: [robertico] regex help [In reply to] Can't Post

I think rovf is right on target. Also don't use $& as it has a performance cost within Perl. Use the results of your capture that's contained in $1.

Consider the following:


Code
use warnings; 
use strict;
use Data::Dumper;

my @matches;

while ( my $line = <DATA> ) {
while ( $line =~ /\{(.*?)\}/g ) {
push @matches, $1 if $1 and $1 !~ /\*/;
}
}
print Dumper \@matches;

__DATA__
some useless text {"to_id": 0, "message": "This is a sample", "message_id": 1000, "from_id": 999} more useless text {"to_id": 1, "message": "This is a sample", "message_id": 1000, "from_id": 999} and more
some useless text {"to_id": 3, "message": "*This is a sample", "message_id": 1000, "from_id": 999} more useless text {"to_id": 4, "message": "This is a sample", "message_id": 1000, "from_id": 999} and more


Output:


Code
$VAR1 = [ 
'"to_id": 0, "message": "This is a sample", "message_id": 1000, "from_id": 999',
'"to_id": 1, "message": "This is a sample", "message_id": 1000, "from_id": 999',
'"to_id": 4, "message": "This is a sample", "message_id": 1000, "from_id": 999'
];


The second regex just checks whether "*" is contained in your capture. The $1 and $1 !~ /\*/ notation just makes sure that you did, in fact, capture something, since you could 'capture' an empty string between "{}" and it would be true that "*" isn't within "".

Hope this helps!


robertico
Novice

Mar 22, 2013, 7:03 AM

Post #10 of 15 (721 views)
Re: [BillKSmith] regex help [In reply to] Can't Post

Works almost as expected (unwanted results are filtered out) but the others are printed twice and I've a lot of empty lines.

Already tried to change it this way, but no luck.


Code
		print OUTFILE join( "\n", @required), "\n\n";



Code
foreach (@required) { 
print OUTFILE "$_";
}



robertico
Novice

Mar 22, 2013, 7:11 AM

Post #11 of 15 (717 views)
Re: [Kenosis] regex help [In reply to] Can't Post

This one works excellent. Thank you very much !! Laugh


Quote
Also don't use $& as it has a performance cost within Perl. Use the results of your capture that's contained in $1.


When I use $1 the curly brace is ommitted in the result and with $& it's included (as desired)


(This post was edited by robertico on Mar 22, 2013, 7:18 AM)


BillKSmith
Veteran

Mar 22, 2013, 7:33 AM

Post #12 of 15 (713 views)
Re: [robertico] regex help [In reply to] Can't Post

My print was designed to duplicate your sample output. It puts a newline after each result and leaves a blank line after the results for each input line. Your first print sends this to OUTFILE.

Your second print sends all the results to a single long line in OUTFILE.

You probably want only the first. (Remove one of the two newlines at the end to eliminate the blank lines.)
Good Luck,
Bill


Kenosis
User

Mar 22, 2013, 7:41 AM

Post #13 of 15 (712 views)
Re: [robertico] regex help [In reply to] Can't Post

Ah! Then try the following--and there would be no need to check the returned capture for emptiness, since--at the very least--it would contain "{}":


Code
	while ( $line =~ /(\{.*?\})/g ) { 
push @matches, $1 if $1 !~ /\*/;
}


This just moves the capture to include the braces...


rovf
Veteran

Mar 22, 2013, 8:00 AM

Post #14 of 15 (707 views)
Re: [BillKSmith] regex help [In reply to] Can't Post

My I ask why you put parentheses around my @first_pass? It's list context anyway.


BillKSmith
Veteran

Mar 22, 2013, 8:17 AM

Post #15 of 15 (703 views)
Re: [rovf] regex help [In reply to] Can't Post

Bad habit! By the time that I noticed it, the OP had already used the code. It seemed to be to late to fix it.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives