CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
plotting Histogram using perl

 



kapab07
Novice

Mar 29, 2013, 10:56 AM

Post #1 of 10 (871 views)
plotting Histogram using perl Can't Post

Hi,
I have a perl script to get N-N distances from pdb file and output into txt file. Now I want to plot histogram of the distribution out of it but I do not know how. Has anyone have any slight idea to share with me?
Here is my code:

#!/usr/bin/perl

use strict;
use Data::Dumper;

my @points = ();
my $atname;
my $X;
my $Y;
my $Z;
my $resid;
my $dist;
open(FILE,$ARGV[0]);

my @file = <FILE>;

foreach my $l (@file){
chomp($l);
if ($l=~/^ATOM/ && $l=~/H/){
my @line = split (/\s+/,$l);
$atname = $line[-1];
$resid = $line[5];
$X = $line[6];
$Y = $line[7];
$Z = $line[8];

print "$atname $resid $X $Y $Z \n";
push @points, [$X, $Y, $Z];
}
}
print '@points:', Dumper\@points;

for my $i1 (0 .. $#points - 1){
my ($X1, $Y1, $Z1) = @{ $points[$i1]};
for my $i2 (1 .. $#points){
my ($X2, $Y2, $Z2) = @{$points[$i2]};
$dist = sqrt(($X2 - $X1)**2 + ($Y2 - $Y1)**2 + ($Z2 - $Z1)**2);
print "$dist \n";
}
}

close(FILE);

Here is the output:

2.2921031826687
4.74372343207316
3.47935367561276
4.19582530618232
5.19770228081601
5.58571973876241
6.88522323530617
4.92587271455526
6.05613655724506
7.89389143578755
8.45062388229413
7.61263587727667
5.50034417104966
4.49569616411074
12.0976795295627
10.7667785804297
10.1328385953789
17.5065141304601
16.7422111144257
16.6237397717842
8.16091906588957
6.90542236217308
12.4176714403305
13.6157767681466
15.6084002383332
14.7195439127712
13.1809257262151
8.88198024091475
9.38677132990892
12.2769785370831
12.4627652629743
10.9972631140662
4.27179189568031
4.71936446992601
6.44866141148688
9.98399183693576
10.1244012662478
10.7751108578984
16.2077179454728
16.5092491349546
16.9843134097319
0
4.98644653034603
4.14613277645567
3.53970916884424
3.92124776059866
4.76923411042067
5.0739666928351
2.90939873513412
3.81765752785658
5.60310637057695
6.31530403068609

Now I would like to have a script that can plot the Histogram of this distribution. I can not use matlab as I have hundreds of these.
Please help.


Zhris
User

Mar 29, 2013, 12:29 PM

Post #2 of 10 (863 views)
Re: [kapab07] plotting Histogram using perl [In reply to] Can't Post

Hey,

GD::Graph::histogram ( http://search.cpan.org/~whizdog/GDGraph-histogram-1.1/lib/GD/Graph/histogram.pm ) could be a suitable option.

Chris


(This post was edited by Zhris on Mar 29, 2013, 12:31 PM)


Chris Charley
User

Mar 29, 2013, 5:51 PM

Post #3 of 10 (844 views)
Re: [kapab07] plotting Histogram using perl [In reply to] Can't Post

perl histogram gives somehelpful search results. In particular, this result can easily be modified for your purposes. It can produce output on your data like below (if you round your input to nearest integer):

Code
  007               x  x   
006 x x x
005 x x x
004 x x x x x x
003 x x x x x x x x
002 x x x x x x x x x x x x
001 x x x x x x x x x x x x x x x x x x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18



(This post was edited by Chris Charley on Mar 30, 2013, 9:28 PM)
Attachments: hist_horiz.pl (1.17 KB)
  hist_leg.pl (1.48 KB)


BillKSmith
Veteran

Mar 29, 2013, 6:32 PM

Post #4 of 10 (842 views)
Re: [kapab07] plotting Histogram using perl [In reply to] Can't Post

Chris probably has the best solution for the actual histogram. I can make several comments on the code that you posted.

Your code could not have produced the output that you posted. I have no way of knowing which (if either) is correct. The nested loops would have produced one zero for every valid data point (The distance from the point to itself). You would have had two of every other distance (The distance from point I1 to I2 is the same as the distance from I2 to I1). By the way, Your code ignores the last data point.

Declaring most of your variables with file scope negates most of the advantage of using strict.

There is no reason to slurp the entire file. It should be processed line-by-line.

You should use lexical file handles and the three argument form of open. You should always verify the success of open. Close should be done as soon as possible.

The vector arithmetic should be in a subroutine, perhaps even a module.

Most of us do not know what a pdb file is. How about giving us a specification, a non-trivial sample, or a link.

Please post your code with code tags. The forum editors provide tools to insert them.
Good Luck,
Bill


BillKSmith
Veteran

Mar 30, 2013, 6:33 AM

Post #5 of 10 (828 views)
Re: [BillKSmith] plotting Histogram using perl [In reply to] Can't Post

The use of CPAN modules can greatly improve your existing code. The following example uses the subset function to avoid the messy details necessary to get all the pairs of points and nothing else. Storing the points as vector objects completly eliminates the need for the individual components in the code. The dist method clearly computes the distance between the two vectors.


Code
use strict; 
use warnings;
use Algorithm::Combinatorics qw(subsets);
use Math::Vector::Real;
my @points;
foreach my $line (<DATA>) {
next if $line !~ /^ATOM.*H/;
chomp $line;
my $vector = V((split /\s+/, $line)[6 .. 8]);
push @points, $vector;
}
my $iter = subsets( \@points, 2 );
while (my $pair = $iter->next) {
my $distance = $pair->[0]->dist( $pair->[1] );
print $distance, "\n";
}
__DATA__
ATOM fie fii foo fum H 1. 2. 3. aname
# This is a comment line!
ATOM fie fii foo fum H 1. 5. 7. bname
ATOM fie fii foo fum H 1. 11. 15. cname

Good Luck,
Bill


kapab07
Novice

Apr 1, 2013, 6:56 AM

Post #6 of 10 (795 views)
Re: [Zhris] plotting Histogram using perl [In reply to] Can't Post

Thanks for your responses and the website

In Reply To


kapab07
Novice

Apr 1, 2013, 6:59 AM

Post #7 of 10 (792 views)
Re: [BillKSmith] plotting Histogram using perl [In reply to] Can't Post

Good morning Bill. I really appreciate your responses to my questions. Thanks a lot

In Reply To


kapab07
Novice

Apr 1, 2013, 7:01 AM

Post #8 of 10 (791 views)
Re: [Chris Charley] plotting Histogram using perl [In reply to] Can't Post

Good morning Chris. Thanks for the attachments. Have a wonderful week.

In Reply To


kapab07
Novice

Apr 1, 2013, 7:12 AM

Post #9 of 10 (790 views)
Re: [BillKSmith] plotting Histogram using perl [In reply to] Can't Post

Hi Bill,
Here is a link showing a pdb file.
http://www.pitt.edu/~mcs2/teaching/biocomp/tutorials/pdb_file.html
Thanks for your responses. I appreciate it.

In Reply To


Chris Charley
User

Apr 1, 2013, 11:49 AM

Post #10 of 10 (774 views)
Re: [kapab07] plotting Histogram using perl [In reply to] Can't Post

There are 3288 lines beginning with ATOM (in the sample pdb file from Univ of Pittsburgh). To calculate the distances is a combination of 3288 choose 2 for a total of (3288*3287/2) = 5,403,828 distances and you want to creat a histogram from these 5 million? Not as simple a task as the sample distances you provided, (52 distances).

Well, after running a modified program I was able to create a histogram from the University sample. Each 'x' on the histogram is a quantity of 1500 points. The program and the histogram are attached,


Or, without Algorithm::Combinatorics, (t5.pl).


(This post was edited by Chris Charley on Apr 1, 2013, 4:53 PM)
Attachments: t6.pl (0.72 KB)
  o44.txt (3.98 KB)
  t5.pl (0.62 KB)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives