CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
string to less memory

 



marten
New User

May 11, 2010, 8:58 AM

Post #1 of 4 (406 views)
string to less memory Can't Post

Hello,

does anyone know if there is a way to convert a string with only 4 different characters (ACGT) to a more packed, less memory/bits, variable? E.g. a packed string representation or bitrepresentation?

I have a lot of DNA sequences stored in memory, so i want them to be more packed so it will use less memory.

Cheers,

Marten


BillKSmith
Veteran

May 11, 2010, 8:50 PM

Post #2 of 4 (394 views)
Re: [marten] string to less memory [In reply to] Can't Post

It almost certainly is worth your effort to check out DNA modules on CPAN. Even if you cannot use any of the modules, you should adopt a standart data structure. This wil make it easier to incorporate modules in the future. Other people are more apt to want to use your code. If you still insist on going off on your own, you can reduce your storage by a factor of four by translating the four symbols to the integers 0 through 3 and packing them with the vec function.

Refer to perldoc perlop for the tr operator

Refer to perldoc -f vec for the vec function



The following code shows how it could work.

use strict;
use warnings;

my $input_string;
my $packed_string='';
my $packing_index=0;

while ($input_string = <DATA>) {
chomp($input_string);
$input_string =~ tr [ACGT] [0-3];
foreach (split //, $input_string){
vec( $packed_string, $packing_index++, 2 ) = $_;
}
}

#fetch the symbol #14
my $symbol = vec( $packed_string, 14, 2 );
$symbol =~ tr [0-3] [ACGT];
print "Symbol 14 = $symbol\n";

__END__
CCGTA
GGTACCC
TAC
Good Luck,
Bill


marten
New User

May 12, 2010, 8:47 AM

Post #3 of 4 (376 views)
Re: [BillKSmith] string to less memory [In reply to] Can't Post

Thanks for your reply BillKSmith . Short question; how can i retreive the original sequence again with the bitvector?

You mention that i can get symbol 14. but how to return the full sequence at once? Or should i loop through each symbol? If so, how do i know the size of the bitvector ($packed_string)?



Cheers,
Marten


BillKSmith
Veteran

May 12, 2010, 12:57 PM

Post #4 of 4 (371 views)
Re: [marten] string to less memory [In reply to] Can't Post

I looks like you will have to loop through each symbol (much as I did in creating the packed string). The documentation for vec suggests using unpack. This does not seem to apply to your case because there is no 2-bit field specifier.

Implement this structure as a class. Design your interface to imitate perl strings. Develop and test your application on small cases using strings. Add the packing module when compression becomes necessary.



Again, I strongly recommend studying existing DNA modules before attempting your own. Sorry, but I cannot offer more specific help.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives