
BillKSmith
Veteran
Aug 25, 2012, 7:44 AM
Post #12 of 13
(5608 views)
|
Re: [GeneticsGirl] Help outputting first and last positions of blocks of the same type
[In reply to]
|
Can't Post
|
|
Here is the code which I have been describing. Although it looks much different, it works the same as Laurent's. (Except that the last position is updated correctly.) Every line (even the first) is assumed to be the last line of a block until proven false. Data for the current Block is stored in a hash %block. In each section of the code, a single assignment statement (using hash slices) updates the block. The do block around the initialization is not necessary. Its purpose it to limit the scope of the two temporary variables ($position and $class) used in the initialization. Note that the two 'my' variables with the same names that are used inside the loop are not the same two variables. Their scope is limited to the loop. Refer to perldoc -f undef for an example of this use of undef. The print statements are idiomatic. Hash slice notation is used to force the order of the values. The special variable $OUTPUT_FIELD_SEPARATOR (default: single space) is used implicitly to separate the values. (Thanks to FishMonger for suggesting this idiom in a recent unrelated post) The print statement after the loop is needed to print the final block.
use strict; use warnings; my %block; do { # Initialize first block $_ = <DATA>; my ($position, undef, undef, $class) = split; @block{'first_position', 'last_position', 'occur', 'class' } = ( $position, $position, 1, $class ); }; while (<DATA>) { my ($position, undef, undef, $class) = split; if ($class eq $block{class}) { # Update current block @block{'last_position', 'occur' } = ( $position, $block{occur}+1); } else { # Print previous block print "@block{'first_position', 'last_position', 'occur', 'class'}\n"; # Initialize new block @block{'first_position', 'last_position', 'occur', 'class' } = ( $position, $position, 1, $class ); } } # Print final block print "@block{'first_position', 'last_position', 'occur', 'class'}\n"; __DATA__ 1457 G G SAME 1979 G G SAME 2056 T T SAME 3091 A A SAME 3562 A G DIFF 3778 A A SAME 4124 T T SAME 4229 C T DIFF 4571 A G DIFF 5019 A C DIFF 5114 C C SAME 6291 T T SAME 6414 C C SAME 6553 C C SAME 6941 G G SAME Good Luck, Bill
|