CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I need a program that...:
Can find overlapping coordinates

 



oxydeepu
New User

Jul 11, 2013, 4:30 AM

Post #1 of 4 (5575 views)
Can find overlapping coordinates Can't Post

I have a file of exons like this where col1 is chromosome name col2 is orientation and rest are start and stop.

Contig0 + 127874 130761
Contig0 + 129936 129984
Contig0 + 130573 130607
Contig0 + 130630 130761
Contig0 + 130732 130767
Contig0 + 130784 130818
Contig0 + 130832 130866
Contig0 + 130832 130867
Contig0 + 130893 130928
Contig0 + 130970 131004
Contig0 + 130982 131017


The question is

how to get the overlapping coordinates pairs if there any? If yes take the smallest start position and largest stop position from overlapping coordinates and if no overlap for a pair, report as it is?

for example

Contig0 + 127874 130761
Contig0 + 129936 129984
Contig0 + 130573 130607
Contig0 + 130630 130761
Contig0 + 130732 130767

in this block the result will be

Contig0 + 127874 130767

So at last the result for the example block will be

Contig0 + 127874 130767
Contig0 + 130784 130818
Contig0 + 130832 130867
Contig0 + 130893 130928
Contig0 + 130970 131017

I hope the question is clear.
Can anyone please help me with this.
Thank you in advance,
Deepak


BillKSmith
Veteran

Jul 11, 2013, 8:28 PM

Post #2 of 4 (5564 views)
Re: [oxydeepu] Can find overlapping coordinates [In reply to] Can't Post

The following code demonstrates my method for a single chromosome. It assumes that all exons are in the order of their starting positions. (We can sort them first if necessary.)

If you provide a more realistic sample of your data, I can generalize the code to process it.

Note: I assume that orientation should be ignored in forming ranges.


Code
use strict; 
use warnings;
my ($range_start, $range_end);
my ($chrom , $orientation);
my @ranges;
while (my $exon = <DATA>) {
my ($start_position, $end_position);
chomp $exon;
($chrom, $orientation, $start_position, $end_position) = split /\s/, $exon;
if (!defined $range_start) {
($range_start, $range_end) = ($start_position, $end_position);
next;
}
if ($start_position > $range_end) {
push @ranges, [$range_start, $range_end];
($range_start, $range_end) = ($start_position, $end_position);
}
elsif ($start_position <= $range_end and $end_position > $range_end) {
$range_end = $end_position;
}
}
push @ranges, [$range_start, $range_end];

print "$chrom $orientation $_->[0] $_->[1]\n" foreach @ranges;


__DATA__
Contig0 + 127874 130761
Contig0 + 129936 129984
Contig0 + 130573 130607
Contig0 + 130630 130761
Contig0 + 130732 130767
Contig0 + 130784 130818
Contig0 + 130832 130866
Contig0 + 130832 130867
Contig0 + 130893 130928
Contig0 + 130970 131004
Contig0 + 130982 131017


OUTPUT:

Code
Contig0 + 127874 130767 
Contig0 + 130784 130818
Contig0 + 130832 130867
Contig0 + 130893 130928
Contig0 + 130970 131017

Good Luck,
Bill


oxydeepu
New User

Jul 11, 2013, 10:41 PM

Post #3 of 4 (5561 views)
Re: [BillKSmith] Can find overlapping coordinates [In reply to] Can't Post

Thank you Bill,

The file need not to be sorted all the time, but you you don't need to worry about the orientation. We just need overlapping regions.

I am attaching a bigger part of my data.

Can you take a look at it.

Thanks once again.

Regards,
Deepak
Attachments: test.bed.txt (5.39 KB)


BillKSmith
Veteran

Jul 13, 2013, 7:09 AM

Post #4 of 4 (5544 views)
Re: [BillKSmith] Can find overlapping coordinates [In reply to] Can't Post

Try the attached program. Please report any errors that your find. I know that there are minor errors in the documentation that are not worth fixing until the program is finalized.

Perhaps in the future, I will have time to do the sorting in the main program.
Good Luck,
Bill
Attachments: ranges.pl (4.73 KB)
  range_sort.pl (0.20 KB)
  ranges.html (5.84 KB)
  test.bed.txt (5.39 KB)
  test.bed.sorted.txt (5.61 KB)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives