Home: Perl Programming Help: Regular Expressions:
PCRE backtracking



hwnd
User

May 21, 2014, 12:32 PM


Views: 33053
PCRE backtracking

I am curious is there a way to find the length of backtracking in PCRE? I want to match strings that start with letters and followed by numbers, but fail if the length of the numbers is less than the length of the preceding letters.

For example these would return true on a match.

foobar12345
foob123
foo12

These would fail because the length of numbers is more than the letters.

foo1234
fo123


FishMonger
Veteran / Moderator

May 21, 2014, 12:42 PM


Views: 33046
Re: [recruiter] PCRE backtracking

Please post your code that demonstrates the problem.

Also, post an example string to be matched and what part of it you need to match.


(This post was edited by FishMonger on May 21, 2014, 12:45 PM)


hwnd
User

May 21, 2014, 12:58 PM


Views: 33037
Re: [FishMonger] PCRE backtracking

FishMonger,

I have no desired regex code right now, I could simply use


Code
([a-zA-Z]+)[0-9]+


But this is not what I am asking.

I am wondering in PCRE if you can do backtracking to check the length. I want to match a string that starts with letters, followed by numbers but only if the length of the letters is greater than the length of the numbers.

For example, this would pass.

foo12

Simply because the length of the numbers is 2 and the length of the letters is 3

But this would fail:

foo1234

Because the length of the numbers is greater than the length of the letters.


FishMonger
Veteran / Moderator

May 21, 2014, 2:04 PM


Views: 33009
Re: [recruiter] PCRE backtracking

As far as I know the answer would be no, but you could read over the man page to see if I'm wrong.
http://www.pcre.org/pcre.txt

To me, it sounds like you have an XY problem.


BillKSmith
Veteran

May 22, 2014, 5:19 AM


Views: 32692
Re: [recruiter] PCRE backtracking

I agree with FishMonger that it is probably not possible to do this with a regular expression. Note however that in native perl, very little additional code is required.


Code
use strict; 
use warnings;
while (my $case = <DATA>) {
if ($case =~ s/([a-zA-Z]+)([0-9]+)\s*/length "$1" gt length "$2"/re) {
print "Pass: $case\n";
}
else {
print "Fail: $case\n";
}
}
__DATA__
foobar12345
foob123
foo12
foo1234
fo123


OUTPUT:

Code
Pass: foobar12345 

Pass: foob123

Pass: foo12

Fail: foo1234

Fail: fo123


UPDATE:
A review of PCRE documentation ( http://www.pcre.org/pcre.txt) shows that capturing parenthesis are available. You should be able to write C code equivalent to my perl code.
Good Luck,
Bill

(This post was edited by BillKSmith on May 22, 2014, 7:47 AM)


Laurent_R
Veteran / Moderator

May 22, 2014, 2:38 PM


Views: 32493
Re: [recruiter] PCRE backtracking

PCRE is not used by Perl, it is an emulation package of Perl built-in REs. So, this is not a Perl question. It is therefore sort of off-topic here.

But don't get me wrong, this is really not meant to say that I don't want to help you on your question. But, by definition, Perl developpers don't use PCRE, they have the original built-in version, they don't need a copy. Bill has given you an answer with Perl's original RE built-ins, there is a reasonable chance that this will work under PCRE, but we Perl users don't really know PCRE. Try Bill's solution, and, if it does not work, you should rather ask your question on forums about languages using PCRE such as, I would think, Python, Ruby, PHP, possibly Javascript, Scala and some others.


hwnd
User

May 23, 2014, 9:48 PM


Views: 31847
Re: [BillKSmith] PCRE backtracking

Bill, the PCRE manpage was very interesting that it could be done as such.


Code
 (?| (?=[\x00-\x7f])(\C) | 
(?=[\x80-\x{7ff}])(\C)(\C) |
(?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
(?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))


The issue I suppose is I'm not sure how you account for the length of the first group here.


(This post was edited by recruiter on May 23, 2014, 9:55 PM)


BillKSmith
Veteran

May 24, 2014, 1:30 PM


Views: 31539
Re: [recruiter] PCRE backtracking

My advice remains the same. Capture the two substrings with parenthesis. Compute their lengths and compare them using whatever language you use to call PCRE.
Good Luck,
Bill


Zhris
Enthusiast

May 24, 2014, 2:12 PM


Views: 31519
Re: [recruiter] PCRE backtracking

It could also be done with an "irregular" extended expression (experimental feature). E.g.:


Code
m/^([a-zA-Z]+)(??{my $len = length($1); qr([0-9]{0,$len})})$/


I also noted that there is some information with regards to PCRE support in the Perl regex documentation: http://perldoc.perl.org/perlre.html#PCRE%2fPython-Support

Chris


(This post was edited by Zhris on May 25, 2014, 3:51 PM)


hwnd
User

May 24, 2014, 8:41 PM


Views: 31371
Re: [Zhris] PCRE backtracking

I suppose that only Perl can simulate code like this in regular expressions?


Zhris
Enthusiast

May 25, 2014, 3:55 PM


Views: 30974
Re: [recruiter] PCRE backtracking

I'm unfamiliar with all but a few programming languages, I have not come across this notation in others. The regex above would inevitably not be portable across languages since it uses Perl specific syntax.

Chris


(This post was edited by Zhris on May 25, 2014, 3:56 PM)