May 31, 2010, 7:52 AM
Post #1 of 1
Stroing large data structures on disk
For the last few hours I was trying different ways of storing (serializing) a large data structure to disk. The ds is a 2d array with a few million rows, each has some 0-200 columns containing integers.
I usually use store (or nstore) and retrieve, but never tried them on large ds. Dumping the ds takes quite a long time (more than a few minutes) and worse than that - consumes almost all the memory of the machine (>97% of >10GB!). BTW, simply printing the entire array to a file takes about half of the time it takes to store it, and consumes almost no extra memory, but than I will have to parse it instead of retrieve it...
Anyway, it's quite frustrating. It might be worth mentioning that once on disk, the binary file cab be highly compressed (~1:100), but I was not able to figure out if I can use this nice property - compressing before dumping (via freezing) seems to work even worse.
Put it short - I need some robust method (I have many such ds's), that will allow storing large data structures on disk (optionally also compress them while doing that), and will not consume all memory resources and hopefully also be fast.
This seems like quite a common task but I could find only little tips.