CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: General Discussions: General Questions: Re: [iThunder] postive and negative lookahead assertions: Edit Log



Chris Charley
User

Nov 25, 2015, 5:41 PM


Views: 17881
Re: [iThunder] postive and negative lookahead assertions

Using the perl regex debugger, I hope to show why your second regex, /(line.*)(?!.*fox)/, succeeded to match the first line, line1 brown fox, when you really meant for it to fail instead. The output from the debugger is kind of dense, but I'll try to describe it to make some sense. I got some explanation of what the numbers in the output meant, (bytecode numbers for the bytecode tree the regex prepares during the compilation phase for the regex engine). See a good explanation of them in the last post in this web page here. Below is the output from the debugger for your regex - I bolded the start of the compilation phase and then the start of the execution phase.

Code
Compiling REx "(line.*)(?!.*fox)" 
Final program:
1: OPEN1 (3)
3: EXACT <line> (5)
5: STAR (7)
6: REG_ANY (0)
7: CLOSE1 (9)
9: UNLESSM[0] (17)
11: STAR (13)
12: REG_ANY (0)
13: EXACT <fox> (15)
15: SUCCEED (0)
16: TAIL (17)
17: END (0)
anchored "line" at 0 (checking anchored) minlen 4
Guessing start of match in sv for REx "(line.*)(?!.*fox)" against "line1 brown fox"
Found anchored substr "line" at offset 0...
Guessed: match at offset 0
Matching REx "(line.*)(?!.*fox)" against "line1 brown fox"
0 <> <line1 brow> | 1:OPEN1(3)
0 <> <line1 brow> | 3:EXACT <line>(5)
4 <line> <1 brown fo> | 5:STAR(7)
REG_ANY can match 11 times out of 2147483647...
15 <e1 brown fox> <> | 7: CLOSE1(9)
15 <e1 brown fox> <> | 9: UNLESSM[0](17)
15 <e1 brown fox> <> | 11: STAR(13)
REG_ANY can match 0 times out of 2147483647...
failed...
15 <e1 brown fox> <> | 17: END(0)
Match successful!
Freeing REx: "(line.*)(?!.*fox)"

In the compilation phase, the number for the bytecode tree nodes are listed with the description to the right of them.

OPEN! - opening parenthesis
----EXACT (line) - the exact text to be matched
----STAR - the asterisk
--------REG_ANY - indicates the '.' token
CLOSE! - closing parenthesis
UNLESSM[-3] - unless match
----STAR (9) - the asterrisk
--------REG_ANY - indicates the '.' token
----EXACT (fox) - the exact text to be matched
----SUCCEED (0) - not sure about this one
TAIL - closing parenthesis
END

Now the execution phase:

0 <> <line1 brow> | 1:OPEN1(3)
beginning with the opening parenthesis

0 <> <line1 brow> | 3:EXACT <line>(5)
made the exact match (<line> in the left bracket below.
Note the unmatched text is in the bracket to the right, <1 brown fo>


4 <line> <1 brown fo> | 5:STAR(7)
REG_ANY can match 11 times out of 2147483647...
Now the STAR * modifier will capture the complete text, (next line, <e1 brown fox>). NOTE: all the text has been captured and there is no text left for the lookahead to match. Thus, the entire match will succeed

15 <e1 brown fox> <> | 7: CLOSE1(9)
Closing the capturing parenthesis

15 <e1 brown fox> <> | 9: UNLESSM[0](17)
15 <e1 brown fox> <> | 11: STAR(13)
REG_ANY can match 0 times out of 2147483647...
failed...
Failed to match 'fox' in the look ahead and so 'succeeded'

15 <e1 brown fox> <> | 17: END(0)
Match successful!
Freeing REx: "(line.*)(?!.*fox)"

------------------------------------------------------------------------------

This was a learning experience for me and I know how to read somewhat the debugger output now. :-)
(at least for this somewhat simple case).

Hope you make sense of my explanation. There was no backtracking and the regex succeeded, (where you wanted it to fail), because the first '.*' ate the entire line leaving no 'fox' for the lookahead to detect and cause it to fail.

The right regex would have been ^(line.*$)(?<!fox).

The debug program is below.


Code
#!/usr/bin/perl 
use strict;
use warnings;
use re qw/debug/;

$_ = 'line1 brown fox';
/(line.*)(?!.*fox)/;


A program with a good regex is below:

Code
#!/usr/bin/perl 
use strict;
use warnings;

my $re1 = '^(line.*fox)$'; # no look-around assertions needed
my $re2 = '^(line.*$)(?<!fox)';

for my $re ($re1, $re2) {
print "\tUsing regular expression $re\n";
while (<DATA>) {
print "$1\n" if /$re/;
}
seek DATA, 0, 0;
print "\n";
}

__DATA__
line1 brown fox
line2 black owl
line3 red dear



(This post was edited by Chris Charley on Nov 25, 2015, 5:44 PM)


Edit Log:
Post edited by Chris Charley (User) on Nov 25, 2015, 5:44 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives