May 22, 2002, 12:42 PM
Post #6 of 7
Sure, Dennis, I'll explain it!
Re: [fashimpaur] Splitting a scalar
[In reply to]
I think the loop isn't that interesting, so I'll limit my explanations to the regex I'm using.
The regex looks a lot more complicated than it actually is. The only "advanced" feature that I've used is called a "lookahead assertion". But I'll start at the beginning!
This part will match at least one, but as few as possible digits. Since there's no anchor (like ^) at the beginning of the regex, any non-digit characters(especially the dollar sign) are skipped.
Next comes the so-called zero-width positive lookahead assertion. This means:
a) it doesn't contribute any characters to the match (zero-width)
b) the assertion must be successful (positive)
c) the expression is looking "ahead" (to the right) from exactly this position
Here's a quick example:
$a = 'bar';
$a =~ s/b(?=ar)/c/;
print "$a\n"; # prints 'car'
$a =~ s/c(?=r)/b/;
print "$a\n"; # prints 'car' again
In the first regex, a 'b' is replaced by a 'c' if it's immediately followed by 'ar'. Note that the 'ar', although it is required for the expression to match, is not part of the matched string. Only 'b' is matched and replaced by 'c'.
In the second regex (note that $a is now 'car'), the 'c' is not matched because it's not followed immediately by an 'r', which causes the lookahead assertion to fail.
But back to the original regex, let's have a closer look at the pattern inside the lookahead:
This part will match exactly three digits. This subpattern is grouped using non-capturing parentheses, so we can later quantify that pattern again. I used the non-capturing parens because they're faster than the capturing ones and because I don't need the captured content. But we could also have written
This would have worked in exactly the same way. The next step is the key to this solution:
The lookahead asserts one or more sequences of three digits immediately followed by the end of the string.
So, since this is a global search-and-replace-operation, the regex engine will stop at any position in the string where a multiple of three digits are left until the end of the string and replace the matched characters by themselves and a comma.
And that's all the "magic" there is about this regex!
But wait, there's still room for improvement! We also could have solved the problem using an additional positive lookbehind assertion:
This makes it even nicer to read (at least for me) because you can more easily see that there's only a comma being inserted. (And it's about 30% faster, too.) But I'll leave the explanation on how this exactly works as an excercise to the reader.
At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."