CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner:
string to less memory


New User

May 11, 2010, 8:58 AM

Post #1 of 4 (408 views)
string to less memory Can't Post


does anyone know if there is a way to convert a string with only 4 different characters (ACGT) to a more packed, less memory/bits, variable? E.g. a packed string representation or bitrepresentation?

I have a lot of DNA sequences stored in memory, so i want them to be more packed so it will use less memory.




May 11, 2010, 8:50 PM

Post #2 of 4 (396 views)
Re: [marten] string to less memory [In reply to] Can't Post

It almost certainly is worth your effort to check out DNA modules on CPAN. Even if you cannot use any of the modules, you should adopt a standart data structure. This wil make it easier to incorporate modules in the future. Other people are more apt to want to use your code. If you still insist on going off on your own, you can reduce your storage by a factor of four by translating the four symbols to the integers 0 through 3 and packing them with the vec function.

Refer to perldoc perlop for the tr operator

Refer to perldoc -f vec for the vec function

The following code shows how it could work.

use strict;
use warnings;

my $input_string;
my $packed_string='';
my $packing_index=0;

while ($input_string = <DATA>) {
$input_string =~ tr [ACGT] [0-3];
foreach (split //, $input_string){
vec( $packed_string, $packing_index++, 2 ) = $_;

#fetch the symbol #14
my $symbol = vec( $packed_string, 14, 2 );
$symbol =~ tr [0-3] [ACGT];
print "Symbol 14 = $symbol\n";

Good Luck,

New User

May 12, 2010, 8:47 AM

Post #3 of 4 (378 views)
Re: [BillKSmith] string to less memory [In reply to] Can't Post

Thanks for your reply BillKSmith . Short question; how can i retreive the original sequence again with the bitvector?

You mention that i can get symbol 14. but how to return the full sequence at once? Or should i loop through each symbol? If so, how do i know the size of the bitvector ($packed_string)?



May 12, 2010, 12:57 PM

Post #4 of 4 (373 views)
Re: [marten] string to less memory [In reply to] Can't Post

I looks like you will have to loop through each symbol (much as I did in creating the packed string). The documentation for vec suggests using unpack. This does not seem to apply to your case because there is no 2-bit field specifier.

Implement this structure as a class. Design your interface to imitate perl strings. Develop and test your application on small cases using strings. Add the packing module when compression becomes necessary.

Again, I strongly recommend studying existing DNA modules before attempting your own. Sorry, but I cannot offer more specific help.
Good Luck,


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives