CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Hash and sorting too hard for me

 



puchu
New User

May 14, 2013, 3:24 PM

Post #1 of 2 (397 views)
Hash and sorting too hard for me Can't Post

Hi gurus, I have followed some perl lessons in my school, but I was not able to do this job FrownFrownFrown

I have a file like this with many lines:

Code
chr12	exonerate:est2genome	exon	300	500	.	-	.	match=1 
chr12 exonerate:est2genome exon 50 100 . - . match=1
chr12 exonerate:est2genome exon 130 200 . - . match=1
chr12 exonerate:est2genome exon 600 650 . - . match=1


chr12 exonerate:est2genome exon 10 80 . - . match=2
chr12 exonerate:est2genome exon 600 700 . - . match=2
chr12 exonerate:est2genome exon 100 200 . - . match=2
chr12 exonerate:est2genome exon 300 550 . - . match=2

chr12 exonerate:est2genome exon 200 300 . - . match=3
chr12 exonerate:est2genome exon 90 130 . - . match=3
chr12 exonerate:est2genome exon 301 550 . - . match=3


and the number of lines with the same match id usually is random (not only 4 or 3 lines for match id), so this is just an example to let me exlain the problem.

I would like to produce a file listing for each "match" identifier (match=1 match=2...) in the file only the lowest and the highest values so the result should be so

Code
chr12	exonerate:est2genome	exon	50	650	.	-	.	match=1 
chr12 exonerate:est2genome exon 10 700 . - . match=2
chr12 exonerate:est2genome exon 90 550 . - . match=3



I was thinking to hash the data by the "match" id, to check for the lowest and highest values by regular expression, but I spent two days without understanding how to realize this job in perl. Any idea how to obtain or how to planning the alghoritm? Thanks in advance!!


Chris Charley
User

May 14, 2013, 5:16 PM

Post #2 of 2 (383 views)
Re: [puchu] Hash and sorting too hard for me [In reply to] Can't Post

Here is a possible solution.

Code
#!/usr/bin/perl 
use strict;
use warnings;

chomp(my $line = <DATA>);
my ($min, $max, $match) = (split /\t/, $line)[3, 4, 8];

while (<DATA>) {
next unless /\S/;
chomp;
my ($newmin, $newmax, $newmatch) = (split /\t/)[3, 4, 8];

if ($match eq $newmatch) {
$min = $newmin if $newmin < $min;
$max = $newmax if $newmax > $max;
}
else {
$line =~ s/\d+\t\d+/$min\t$max/;
print $line, "\n";
$min = $newmin;
$max = $newmax;
$match = $newmatch;
$line = $_;
}
}

$line =~ s/\d+\t\d+/$min\t$max/;
print $line, "\n";

__DATA__
chr12 exonerate:est2genome exon 300 500 . - . match=1
chr12 exonerate:est2genome exon 50 100 . - . match=1
chr12 exonerate:est2genome exon 130 200 . - . match=1
chr12 exonerate:est2genome exon 600 650 . - . match=1

chr12 exonerate:est2genome exon 10 80 . - . match=2
chr12 exonerate:est2genome exon 600 700 . - . match=2
chr12 exonerate:est2genome exon 100 200 . - . match=2
chr12 exonerate:est2genome exon 300 550 . - . match=2

chr12 exonerate:est2genome exon 200 300 . - . match=3
chr12 exonerate:est2genome exon 90 130 . - . match=3
chr12 exonerate:est2genome exon 301 550 . - . match=3


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives