CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
How to create a table using hash or array of hashes

 



Tejas
User

Aug 1, 2014, 3:00 AM

Post #1 of 23 (1946 views)
How to create a table using hash or array of hashes Can't Post


Quote
I am confused of how to frame this question.but below code might give u an idea of what iam trying to do

INPUT

Quote
INPUT
1966651010,520,3.44,1
1966651010,14,1.03,1
1966651010,22,1.04,1
1966651010,11,-2.38,1
1966651010,10,-1.03,1

1966651231,1,3.44,1
1966651231,14,1.03,1
1966651231,11,1.04,1
1966651231,2,-2.38,1

1966652212,1,3.44,1
1966652212,14,1.03,1
1966652212,11,1.04,1
1966652212,2,-2.38,1

1967454850,520,10.99,3
1967454850,41,-2.99,3
1967454850,22,.94,3


DESIRED OUTPUT

Quote
ID APPENDED_TYPE POS_AMT NEG_AMT UNIT
1966651010 520-14-22-11-10 5.51 -3.41 1
1966651231 1-14-11-2 5.51 -2.38 1
1966652212 1-14-11-2 5.51 -2.38 1
1967454850 520-41-22 11.93 -2.99 3



Code
Iam really confused how to maintain the hash , but i have tried to maintain a hash of array for ID and Appended Type  


use strict;
use warnings;

my %data;
my $pwd = `pwd`;
chomp($pwd);
my $rcsl_txns= "$pwd/Test.dat";

open (MY_INPUT,"< $input_txns")or die "could not open $input_txns $!";

while (my $line = <MY_INPUT>) {
chomp ($line);
my ($key, $value) = (split /,/, $line)[0,2];
push @{ $data{$key} }, $value;
}

for my $key (sort keys %data) {
printf "%s => %s\n", $key, join '-', @{ $data{$key} };
}

OUTPUT OF MY CODE

Quote
1966651010 520-14-22-11-10
1966652212 1-14-11-2
1966651231 1-14-11-2
1967454850 520-41-22



I could appned the Types and print it , this way , but really not able to differentiate the POsitive and Negative amounts and how to store them

Thanks in advance

Regards
Tejas


BillKSmith
Veteran

Aug 1, 2014, 6:07 AM

Post #2 of 23 (1938 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

I am confused. I have no idea where we are supposed to get the numbers that are in your desired output but not in the output you get.

When I run the script that you posted (after adding code to ignore the blank lines), I get a output quite different from what you say you get.

You are not saving all the input data in your hash. The solution depends on what you want to do with it. I suspect that you need a hash of arrays of arrays.

If you can explain your problem better, I will try to help.
Good Luck,
Bill


Tejas
User

Aug 1, 2014, 6:20 AM

Post #3 of 23 (1934 views)
Re: [BillKSmith] How to create a table using hash or array of hashes [In reply to] Can't Post

Hi Bill,

In My code i have appended the is which were needed,
I actually need to all up all the positive amount per appended id

Code
Input 
SourceID, ENTRYID ,AMOUNT,ORG
1966651010,520,3.44,1
1966651010,14,1.03,1
1966651010,22,1.04,1
1966651010,11,-2.38,1
1966651010,10,-1.03,1

1966651231,1,3.44,1
1966651231,14,1.03,1
1966651231,11,1.04,1
1966651231,2,-2.38,1

1966652212,1,3.44,1
1966652212,14,1.03,1
1966652212,11,1.04,1
1966652212,2,-2.38,1

1967454850,520,10.99,3
1967454850,41,-2.99,3
1967454850,22,.94,3

Here we have to take the matching Source ID's and appened their ENTRY_ID's , and sepreately calculte sum of positive amounts and sum of negative amounts


Quote
output

SOURCEID APPENDED_TYPE POS_AMT NEG_AMT ORG
1966651010 ,520-14-22-11-10, 5.51, -3.41, 1
1966651231 ,1-14-11-2, 5.51, -2.38 ,1
1966652212 ,1-14-11-2, 5.51 ,-2.38 ,1
1967454850, 520-41-22, 11.93, -2.99, 3



And in my code iam just taking the ID as key processing hash of array's for all the different ENTRYID's and joining with "-"

But My intention is to get
ID ,APENDED_ENTRYID ,ADDING ALL POS AMOUNT PER APEENDED ID FOR THIS KEY

Im trying to make a single row for each ID by appendind its entryid and adding the amount's.

Am i Clear now ?

As, u have said i think the data structure has to be deeply visited , which choked me


Tejas
User

Aug 1, 2014, 6:26 AM

Post #4 of 23 (1932 views)
Re: [BillKSmith] How to create a table using hash or array of hashes [In reply to] Can't Post

Sorry
Below is the code

Code
use strict; 
use warnings;

my %data;
my $pwd = `pwd`;
chomp($pwd);
my $rcsl_txns= "$pwd/Test.dat";

open (RCSL_OUTPUT,"< $rcsl_txns")or die "could not open $rcsl_txns $!";

while (my $line = <RCSL_OUTPUT>) {
chomp ($line);
my ($k, $v,$amt) = (split /,/, $line)[0,1];
push @{ $data{$k} }, $v;
}

for my $k (sort keys %data) {
printf "%s => %s\n", $k, join '-', @{ $data{$k} };


output

Quote
1966651010 => 520-14-22-11-10
1966651231 => 1-14-11-2
1966652212 => 1-14-11-2
1967454850 => 520-41-22

And i need the positive and negative amounts per combination of the ID

And iam not saving all the data in has, as im confused of how to access it and get the result.

Thanks


(This post was edited by Tejas on Aug 1, 2014, 6:30 AM)


FishMonger
Veteran / Moderator

Aug 1, 2014, 7:26 AM

Post #5 of 23 (1922 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post


Code
#!/usr/bin/perl 

use strict;
use warnings;
use Data::Dumper;

my %data;
while (my $line = <DATA>) {
chomp $line;
next if $line =~ /^\s*$/;
my ($src, $entry, $amt) = (split /,/, $line)[0..2];
push @{ $data{$src}{entry} }, $entry;
my $num = $amt > 0 ? 'pos' : 'neg';
$data{$src}{$num} += $amt;
}
print Dumper \%data;

__DATA__
1966651010,520,3.44,1
1966651010,14,1.03,1
1966651010,22,1.04,1
1966651010,11,-2.38,1
1966651010,10,-1.03,1

1966651231,1,3.44,1
1966651231,14,1.03,1
1966651231,11,1.04,1
1966651231,2,-2.38,1

1966652212,1,3.44,1
1966652212,14,1.03,1
1966652212,11,1.04,1
1966652212,2,-2.38,1

1967454850,520,10.99,3
1967454850,41,-2.99,3
1967454850,22,.94,3


output:

Code
$VAR1 = { 
'1966651231' => {
'pos' => '5.51',
'neg' => '-2.38',
'entry' => [
'1',
'14',
'11',
'2'
]
},
'1966651010' => {
'neg' => '-3.41',
'entry' => [
'520',
'14',
'22',
'11',
'10'
],
'pos' => '5.51'
},
'1966652212' => {
'entry' => [
'1',
'14',
'11',
'2'
],
'neg' => '-2.38',
'pos' => '5.51'
},
'1967454850' => {
'entry' => [
'520',
'41',
'22'
],
'neg' => '-2.99',
'pos' => '11.93'
}
};



BillKSmith
Veteran

Aug 1, 2014, 8:21 AM

Post #6 of 23 (1915 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

I see that FishMonger beat me with a similar solution. The main difference is that he reduced the amount of data he had to save by processing the amounts in the data collection loop. I do not think either is truly better.


Code
use strict; 
use warnings;
use List::Util qw(sum0);
use Data::Dumper;

my %data;

while ( my $line = <DATA> ) {
next if $line =~/^\s*$/;
chomp($line);
my ( $source_id, $entry_id, $amount, $org ) = split /,/, $line;
push @{$data{$source_id}{entry}}, $entry_id;
push @{$data{$source_id}{amt}}, $amount;
$data{$source_id}{org} = $org;
}
print Dumper \%data;

foreach my $source_id ( sort keys %data ) {
my $type = join'-', @{$data{$source_id}{entry}};
my ( $pos_amt, $neg_amt )
= ( ( sum0 grep {$_ > 0} @{$data{$source_id}{amt}} ),
( sum0 grep {$_ < 0} @{$data{$source_id}{amt}} ),
);
my $org = $data{$source_id}{org};
printf "%s => %-16s,%6.2f,%6.2f,%2d\n",
$source_id, $type, $pos_amt, $neg_amt, $org;
}
__DATA__
1966651010,520,3.44,1
1966651010,14,1.03,1
1966651010,22,1.04,1
1966651010,11,-2.38,1
1966651010,10,-1.03,1

1966651231,1,3.44,1
1966651231,14,1.03,1
1966651231,11,1.04,1
1966651231,2,-2.38,1

1966652212,1,3.44,1
1966652212,14,1.03,1
1966652212,11,1.04,1
1966652212,2,-2.38,1

1967454850,520,10.99,3
1967454850,41,-2.99,3
1967454850,22,.94,3


output:

Code
$VAR1 = { 
'1967454850' => {
'entry' => [
'520',
'41',
'22'
],
'org' => '3',
'amt' => [
'10.99',
'-2.99',
'.94'
]
},
'1966651010' => {
'entry' => [
'520',
'14',
'22',
'11',
'10'
],
'org' => '1 ',
'amt' => [
'3.44',
'1.03',
'1.04',
'-2.38',
'-1.03'
]
},
'1966651231' => {
'entry' => [
'1',
'14',
'11',
'2'
],
'org' => '1 ',
'amt' => [
'3.44',
'1.03',
'1.04',
'-2.38'
]
},
'1966652212' => {
'entry' => [
'1',
'14',
'11',
'2'
],
'org' => '1 ',
'amt' => [
'3.44',
'1.03',
'1.04',
'-2.38'
]
}
};
1966651010 => 520-14-22-11-10 , 5.51, -3.41, 1
1966651231 => 1-14-11-2 , 5.51, -2.38, 1
1966652212 => 1-14-11-2 , 5.51, -2.38, 1
1967454850 => 520-41-22 , 11.93, -2.99, 3

Good Luck,
Bill


Tejas
User

Aug 1, 2014, 9:39 AM

Post #7 of 23 (1910 views)
Re: [FishMonger] How to create a table using hash or array of hashes [In reply to] Can't Post

hi
I dint quite understand this

Code
push @{ $data{$src}{entry} }, $entry;  //What are doing here 
my $num = $amt > 0 ? 'pos' : 'neg';
$data{$src}{$num} += $amt; //Aare we trying change the key to a different column here



Code
push @{ $data{$src}{entry} }, $entry

Here, We are actually creating an array for key
$data{$src}{entry} and stroing the remaining keys in the array.Am i correct ?
Bt why did we change the key to $data{$src}{$num} again
it was $data{$src}{entry} initially.

Iam actually confused.

Please explain
Thanks for the help


Laurent_R
Veteran / Moderator

Aug 1, 2014, 10:01 AM

Post #8 of 23 (1904 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post


In Reply To
hi
I dint quite understand this

Code
push @{ $data{$src}{entry} }, $entry;  //What are doing here 
my $num = $amt > 0 ? 'pos' : 'neg';
$data{$src}{$num} += $amt; //Aare we trying change the key to a different column here



Code
push @{ $data{$src}{entry} }, $entry

Here, We are actually creating an array for key
$data{$src}{entry} and stroing the remaining keys in the array.Am i correct ?
Bt why did we change the key to $data{$src}{$num} again
it was $data{$src}{entry} initially.

Iam actually confused.

Please explain
Thanks for the help


Look at the data produced by Fishmonger's script. A small part of it:

Code
          '1966651231' => {  
'pos' => '5.51',
'neg' => '-2.38',
'entry' => [
'1',
'14',
'11',
'2'
]
},


As you can see, for $src '1966651231', you have a sub-hash with three keys, 'entry' (with the corresponding value being an array ref), 'pos' and 'neg', for positive and negative values. The code line with the push is adding elements to the entry arraéy ref, the "$data{$src}{$num} += $amt;" line calculates the positive and negative values.


FishMonger
Veteran / Moderator

Aug 1, 2014, 10:12 AM

Post #9 of 23 (1903 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

In my example (and Bill's) %data is a HoH (Hash of Hashes) which has 2 levels of keys. The first (outer) key is your "SourceID" and the second (inner) level has 3 keys 1) entry which holds an array ref of your "ENTRYID" values, 2) pos is the sum total of the positive amount values, and 3) neg is the sum total of your negitive amount values.


Code
push @{ $data{$src}{entry} }, $entry;

That adds/pushes the "entryid" onto the array.


Code
my $num = $amt > 0 ? 'pos' : 'neg';

If amount is > 0, then it's a positive value and set the hash key ($num) to 'pos'. If it's < 0, then we set the hash key to 'neg'.


Code
$data{$src}{$num} += $amt;

This adds the $amt value to either the positive or negative total depending on the value of $num, which was calculated in the line above.

Add this to the script.

Code
foreach my $src_id (sort keys %data) { 
printf("%-10s => %-20s %6.2f %6.2f\n",
$src_id,
join('-', @{ $data{$src_id}{entry} }),
$data{$src_id}{pos},
$data{$src_id}{neg}
);
}

That loop will output:

Code
1966651010 => 520-14-22-11-10        5.51  -3.41 
1966651231 => 1-14-11-2 5.51 -2.38
1966652212 => 1-14-11-2 5.51 -2.38
1967454850 => 520-41-22 11.93 -2.99



(This post was edited by FishMonger on Aug 1, 2014, 10:18 AM)


Tejas
User

Aug 2, 2014, 11:40 AM

Post #10 of 23 (1857 views)
Re: [FishMonger] How to create a table using hash or array of hashes [In reply to] Can't Post

Thanks Monger and Laurent
i understood it.
I would really like toknow how you ve got a solution for this problem instantly.

i would want to understand the problem like u GUYS ve understood
(even though i dint phrase the questiom properly,as i was confused) and instantly given me the solution

It it the thought process which iam lacking in ?
Simply , How shud i think lke u guys?
Please SUGGEST.


Zhris
Enthusiast

Aug 2, 2014, 12:58 PM

Post #11 of 23 (1846 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

Hey,

I haven't been involved in this thread, but you provided exactly what was needed, input data AND desired output data. Quite often no explanation of how you get from input to output is necessary, it can be logically determined. It is then a matter of experience using the right Perl functionality to process the input data in order to produce the desired output data.

I was once in your exact shoes and asked myself the same questions. It was by spending the time following this forum I gained the experience necessary to achieve most tasks. I like to look at a problem, try to solve it myself, then learn from the answers provided. Books can only teach you so far, forums give you exposure to real world issues. They have no idea, but i'll be forever grateful to certain members of this forum, they have massively helped me progress in my hobby and career.

The best of luck,

Chris


Tejas
User

Aug 7, 2014, 1:56 AM

Post #12 of 23 (1656 views)
Re: [FishMonger] How to create a table using hash or array of hashes [In reply to] Can't Post

Hi
I actually have a file with 1 crore lines
and this code is eating up all the memory and system hangs
Almost 99% of CPU time of the server.

I tried to sort it but sort itself is taking up a lot of time..

Can you suggest me some other approach ...


Thnaks
Tejas


(This post was edited by Tejas on Aug 7, 2014, 2:22 AM)


FishMonger
Veteran / Moderator

Aug 7, 2014, 6:42 AM

Post #13 of 23 (1649 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

I would need to test the script with a larger set of data to be sure, but I don't see anything in the code that would cause high cpu usage. If the data file is very large, then it could easily cause high memory usage.

What changes did you make to the code? Can you post the complete script you tested.

One way to reduce the memory and cpu footprint would be to read the data in blocks (i.e., set the input record separator to paragraph mode) and output that row of data as it's seen/processed. Your sample data indicates that the blocks are already sorted by ID. If that' the case, then processing one block at a time would still be in that same sorted order in the final output.


Laurent_R
Veteran / Moderator

Aug 7, 2014, 10:01 AM

Post #14 of 23 (1618 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

Reading by blocks and using a simple hash on each block, as suggested by Fishmonger, is one possible way to go.

You could even do without a hash and read your file line by line. Each time you read a line, if it is a new block of data reset the positive and negative variables to the value of the line and reset an array to contain the other values. If not, add the positive or negative values, and push the other variable on the array. At sequence break, print the result for that sequence.


Tejas
User

Aug 7, 2014, 10:10 AM

Post #15 of 23 (1612 views)
Re: [Laurent_R] How to create a table using hash or array of hashes [In reply to] Can't Post

Hello Laurent

Sorry, But iam confused.

Infact, i dint understand what does reading data by blocks mean here.

Dint understand these.(New Block , Data Reset , Restting Array , Sequence break etc.)

Can u post a code snippet for these if possible.

I have around 1` crore lines and i dint change the code at all.

Thanks
Tejas


Laurent_R
Veteran / Moderator

Aug 7, 2014, 10:14 AM

Post #16 of 23 (1611 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

OK, I'll try to post something.


Laurent_R
Veteran / Moderator

Aug 7, 2014, 10:43 AM

Post #17 of 23 (1605 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

OK, this is a very quick attempt, not much time now. It could probably be a bit cleaner, but it seems to work fine and you'll get the idea.


Code
use strict; 
use warnings;

my $previous_key = "";
my (@v_array, $pos, $neg);

while (my $line = <DATA>) {
chomp $line;
next if $line =~ /^\s+$/;
my ($key, $v, $amt) = (split /,/, $line)[0..2];
if ($key ne $previous_key) {
print "$previous_key ", (join '-', @v_array), " $pos $neg \n" unless $previous_key eq "";
@v_array = ();
($pos, $neg) = (0, 0);
$previous_key = $key;
}
push @v_array, $v;
if ($amt > 0) {
$pos += $amt;
} else {
$neg += $amt;
}
}
print "$previous_key ", (join '-', @v_array), " $pos $neg \n" unless $previous_key eq "";

__DATA__
1966651010,520,3.44,1
1966651010,14,1.03,1
1966651010,22,1.04,1
1966651010,11,-2.38,1
1966651010,10,-1.03,1

1966651231,1,3.44,1
1966651231,14,1.03,1
1966651231,11,1.04,1
1966651231,2,-2.38,1

1966652212,1,3.44,1
1966652212,14,1.03,1
1966652212,11,1.04,1
1966652212,2,-2.38,1

1967454850,520,10.99,3
1967454850,41,-2.99,3
1967454850,22,.94,3


This is the output:


Code
$ perl  tejas.pl 
1966651010 520-14-22-11-10 5.51 -3.41
1966651231 1-14-11-2 5.51 -2.38
1966652212 1-14-11-2 5.51 -2.38
1967454850 520-41-22 11.93 -2.99


I did not notice earlier that you also wanted an additional field, but I'll leave it to you to add that if you want to use this approach.

The point is that, doing that, you never have more than one line of input in memory (plus a bit for the $v_array which changes each time there is a sequence break, i.e. a new $key), so that it will work with very very low memory usage, even with a gigantic input file.


(This post was edited by Laurent_R on Aug 7, 2014, 10:49 AM)


Laurent_R
Veteran / Moderator

Aug 7, 2014, 10:55 AM

Post #18 of 23 (1598 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post

What I call a block is a series of lines:

Code
1966651231,1,3.44,1  
1966651231,14,1.03,1
1966651231,11,1.04,1
1966651231,2,-2.38,1

in which the key (first field) is the same.

Notice that in my solution above, I don't even use a hash (although it could also be done), but just an array and two scalar variables.

As you can see, the code I posted works and uses almost no memory. Try it out!

EDIT Aug. 7, 06:56 p.m. - It seems that you removed the post to which this post was answering. Why?


(This post was edited by Laurent_R on Aug 7, 2014, 1:01 PM)


Tejas
User

Aug 7, 2014, 10:58 AM

Post #19 of 23 (1595 views)
Re: [Laurent_R] How to create a table using hash or array of hashes [In reply to] Can't Post


Code
if ($key ne $previous_key) {  
print "$previous_key ", (join '-', @v_array), " $pos $neg \n" unless $previous_key eq ""; // Print the else part's values .
@v_array = ();
($pos, $neg) = (0, 0);
$previous_key = $key;
}

. When ever we have a new key , we are resttinh the array , pos and neg.

Else, if the key is same , we are pushing stuff in the array and adding the posiives and negatives


Code
push @v_array, $v;  
if ($amt > 0) {
$pos += $amt;
} else {
$neg += $amt;
}


This part of code after while loop just prints the last iterated values ffrom the file .is nt it ?

Code
print "$previous_key ", (join '-', @v_array), " $pos $neg \n" unless $previous_key eq "";

Can you please tell me if i ve apprehended it correctly ?


Tejas
User

Aug 7, 2014, 11:00 AM

Post #20 of 23 (1593 views)
Re: [Laurent_R] How to create a table using hash or array of hashes [In reply to] Can't Post

Yeah Laurent.

Well, whn i posted that, i dint see your code and you dint post it by then.
After that i have realised that you have already posted the code .

Actually this post was continuation for my earlier question, but not for the code you posted

And , Will this code work fine for Unsorted data ?
i think i have maitain a hash and an array and amounts to sync if the key is read afterwards
if its unsorted data, which will eat cpu time...


(This post was edited by Tejas on Aug 7, 2014, 11:19 AM)


Laurent_R
Veteran / Moderator

Aug 7, 2014, 11:49 AM

Post #21 of 23 (1576 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post


In Reply To

Code
if ($key ne $previous_key) {  
print "$previous_key ", (join '-', @v_array), " $pos $neg \n" unless $previous_key eq ""; // Print the else part's values .
@v_array = ();
($pos, $neg) = (0, 0);
$previous_key = $key;
}

. When ever we have a new key , we are resttinh the array , pos and neg.

Else, if the key is same , we are pushing stuff in the array and adding the posiives and negatives


Code
push @v_array, $v;  
if ($amt > 0) {
$pos += $amt;
} else {
$neg += $amt;
}


This part of code after while loop just prints the last iterated values ffrom the file .is nt it ?

Code
print "$previous_key ", (join '-', @v_array), " $pos $neg \n" unless $previous_key eq "";

Can you please tell me if i ve apprehended it correctly ?


Yes, you've understood the idea.
Quite simple, isn't it?


Tejas
User

Aug 7, 2014, 11:55 AM

Post #22 of 23 (1573 views)
Re: [Laurent_R] How to create a table using hash or array of hashes [In reply to] Can't Post

Thanks Laurent

My Question is , whwther this will work for unsorted data
Mainly, My Keys are alphanumeric, the data set i have pasted as input is just a part of it

Keys might be numbers and also alpha numerics
And the data is unsorted too

Some of my keys are below :
172233111
XYZ12514251
XIHSBCDHSB1
145526526
776653209998
assn:ueiye:6553:settle


Laurent_R
Veteran / Moderator

Aug 7, 2014, 12:06 PM

Post #23 of 23 (1571 views)
Re: [Tejas] How to create a table using hash or array of hashes [In reply to] Can't Post


In Reply To
Well, whn i posted that, i dint see your code and you dint post it by then.
After that i have realised that you have already posted the code .

Actually this post was continuation for my earlier question, but not for the code you posted



No problem, but it is probably better to add an edit comment the way I did it, rather than remove things altogether.



In Reply To
And , Will this code work fine for Unsorted data ?


NO, this is a very important point, this code only works for sorted data and will completely break apart on unsorted data (well, to be more precise, the real important point is that the lines with the same key need to be together, whether the blocks of such lines are sorted is irrelevant and we don't care).

But if your input files are really huge and will exhaust memory if you try to load them in a hash, then the best solution is very often to start by sorting the lines (using for example the Unix sort utility) and then to process them.

I am doing that regularly at my job with files larger than about 5 to 10 GB, because they will simply not hold in a hash, I would run out of memory. But when you are processing such huge file, it is not so crazy to spend 15 minutes sorting it before actual processing, even if the subsequent Perl processing takes only 10 minutes. Overall, 25 minutes for processing a 10 GB file is not such a bad performance.


(This post was edited by Laurent_R on Aug 7, 2014, 12:07 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives