Home: Perl Programming Help: Regular Expressions:
Problem extracting an expression 2



kepler
Novice

Jul 24, 2014, 3:52 AM


Views: 27745
Problem extracting an expression 2

Hi,

I'm having some troubles to extract an expresiion from a $line. The type of expression is, for example:

KP_what_I_want (space)
or
KP_what_I_want(
or
KP_what_I_want (
or
KP_what_I_want +
etc...

I'm using:


Code
my ($exp) =  
$line =~ m/
KP_ # Required
(.*) # Capture Desired Output
(?:\s)? # Optional - Do not capture
(?:\()? # Optional - Do not capture
/xi;


It's not working... it extracts the all expression ( without the KP_)

Any ideas?

Thanks,

Kepler


(This post was edited by kepler on Jul 24, 2014, 3:53 AM)


BillKSmith
Veteran

Jul 24, 2014, 6:43 AM


Views: 27740
Re: [kepler] Problem extracting an expression 2

First, let me explain what goes wrong. The '*' in '(.*)' is 'greedy'. It matches as much as possible and still have the rest of the pattern match. In your case, it matches to the end of the string. The remaining two fields are optional, so they both match the empty string which remains.

Unfortunately, the non-greedy operator does not fix the problem. It now matches a null string. Again, the other two fields are optional, so they match.

As long as you use the '.' with '*' (or any of its relatives), you must specify what must come after it. This leaves you with two options. You can either specify the required field in a way that will not match anything else, or you can specify what comes next.


The following code works for you sample data. A real solution to your problem requires more knowledge about your data.



Code
use strict; 
use warnings;
my @expressions = (
'KP_what_I_want (space)',
'KP_what_I_want(',
'KP_what_I_want (',
'KP_what_I_want +',
);

foreach my $line (@expressions) {
my ($exp1) =
$line =~ m/
KP_ # Required
(.+?) # Capture Desired Output
(:?\s\(|\(|\s+) # One required - no capture
/xi;
print ">>$exp1<<\n";

my ($exp2) =
$line =~ m/
KP_ # Required
([^ +(]+) # Capture Desired Output
/xi;
print ">>$exp2<<\n";

print "\n";
}

Good Luck,
Bill