
rpaskudniak
User

Jul 4, 2013, 9:08 AM
Post #1 of 2
(2886 views)
|
Capture the separator after a split
|
Can't Post
|
|
Greetings. For once I have a contribution, albeit a minor tidbit, to make instead of asking a question. Note that my serendipitous discovery is actually documented at the very end of the perldoc page for the split function. (The moral equivalent of the fine print; almost nobody reads that far. ) See: http://perldoc.perl.org/functions/split.html I made an interesting mistake: I have two numbers separated by either a + or - sign. for example "2-3". I needed to split the string to capture the two numbers but I also needed to capture the intervening sign so that I could apply it later.
$nstring = "2-3"; @numbers = split(/[+-]/, $nstring); This gives me @numbers is (2, 3) but does not tell me the sign. So how do I know if the latter number is + or -? So I tried surrounding the pattern with parentheses, as I would in a ~ or =~ operation:
@numbers = split(/([+-])/, $nstring); I thought that would leave the sign in $1. I was wrong; $1 remains undefined. (Actually, in my case, it retained the value from a previous matching operation, which had me scratching my head for a while.) A bigger surprise: The array now contains 3 element instead of 2. When I examined the @numbers array I found (2, -, 3); the splitting pattern was included with the array. Once I realized this, I was able to use the sign; I just had to know that the second number was in slot[2] rather than in [1]. In retrospect, my idea of trying to capture the separator in $1 was wrong headed anyway. Consider a string like this: "Rasputin:unused;1000:513;U-Maxwell" I have used 2 separators here, both the colon and semicolon.
my $pwline = "Rasputin:unused;1000:513;U-Maxwell"; print "$pwline\n"; my @pwparts1 = split(/[:;]/, $pwline); # May be split by either : or ; my $partcount = @pwparts1; print "Split $partcount components: "; print "[", join("] [", @pwparts1), "]\n\n"; my @pwparts2 = split(/([:;])/, $pwline); # May be split by either : or ; $partcount = @pwparts2; my $sep = (defined($1)) ? $1 : "(undefined)"; print "Split $partcount components: "; print "[", join("] [", @pwparts2), "]\n\n"; Here's the output:
Rasputin:unused;1000:513;U-Maxwell Split 5 components: [Rasputin] [unused] [1000] [513] [U-Maxwell] Split 9 components: [Rasputin] [:] [unused] [;] [1000] [:] [513] [;] [U-Maxwell] If I could capture it in $1, which separator would go there? The : or ;? BTW, the parenthesized pattern is called a "capture group". -------------------- -- Rasputin Paskudniak (In perpetual pursuit of undomesticated, semi-aquatic avians)
|