Home: Perl Programming Help: Intermediate:
Capture the separator after a split


Jul 4, 2013, 9:08 AM

Views: 3038
Capture the separator after a split


For once I have a contribution, albeit a minor tidbit, to make instead of asking a question. Note that my serendipitous discovery is actually documented at the very end of the perldoc page for the split function. (The moral equivalent of the fine print; almost nobody reads that far. Wink ) See: http://perldoc.perl.org/functions/split.html

I made an interesting mistake:

I have two numbers separated by either a + or - sign. for example "2-3". I needed to split the string to capture the two numbers but I also needed to capture the intervening sign so that I could apply it later.

$nstring = "2-3"; 
@numbers = split(/[+-]/, $nstring);

This gives me @numbers is (2, 3) but does not tell me the sign. So how do I know if the latter number is + or -? So I tried surrounding the pattern with parentheses, as I would in a ~ or =~ operation:

@numbers = split(/([+-])/, $nstring);

I thought that would leave the sign in $1. I was wrong; $1 remains undefined. (Actually, in my case, it retained the value from a previous matching operation, which had me scratching my head for a while.) A bigger surprise: The array now contains 3 element instead of 2. When I examined the @numbers array I found (2, -, 3); the splitting pattern was included with the array. Once I realized this, I was able to use the sign; I just had to know that the second number was in slot[2] rather than in [1].

In retrospect, my idea of trying to capture the separator in $1 was wrong headed anyway. Consider a string like this:
I have used 2 separators here, both the colon and semicolon.

my $pwline = "Rasputin:unused;1000:513;U-Maxwell"; 
print "$pwline\n";
my @pwparts1 = split(/[:;]/, $pwline); # May be split by either : or ;
my $partcount = @pwparts1;
print "Split $partcount components: ";
print "[", join("] [", @pwparts1), "]\n\n";

my @pwparts2 = split(/([:;])/, $pwline); # May be split by either : or ;
$partcount = @pwparts2;
my $sep = (defined($1)) ? $1 : "(undefined)";

print "Split $partcount components: ";
print "[", join("] [", @pwparts2), "]\n\n";

Here's the output:

Split 5 components: [Rasputin] [unused] [1000] [513] [U-Maxwell]

Split 9 components: [Rasputin] [:] [unused] [;] [1000] [:] [513] [;] [U-Maxwell]

If I could capture it in $1, which separator would go there? The : or ;?

BTW, the parenthesized pattern is called a "capture group".
-- Rasputin Paskudniak (In perpetual pursuit of undomesticated, semi-aquatic avians)

Veteran / Moderator

Jul 4, 2013, 10:53 AM

Views: 3032
Re: [rpaskudniak] Capture the separator after a split

In Reply To
If I could capture it in $1, which separator would go there? The : or ;?

The question is a bit rhetorical, since you can't capture it in $1. But the logics in this kind of things (a list of matches collapsed into a scalar) would be that you would probably get the last match.

A somewhat similar example under the Perl debugger:

  DB<1>  $_ = "foo bar baz too" 

DB<2> print $1 if (@d = /(.oo)/g)
DB<3> x @d
0 'foo'
1 'too'