CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Fast matching in 2 arrays

 



alanj
New User

Feb 6, 2013, 4:25 PM

Post #1 of 6 (825 views)
Fast matching in 2 arrays Can't Post

Hi,
Can anyone recommend a faster way to find matches of values held in 2 arrays? I have something that works but it's really slow.

I have 2 2D arrays. Array1[X][Y] and Array2[X][Y].
For each value in Array1[X][0] ( ie the first column ) I want to see if there is a matching value in Array2[X][0]. If there is then I want to populate a 3rd array.
The length of the arrays can vary but both are in the order of 100k rows.

Here's my code that works, slowly.

Code
for $i ( 0 .. $#Array1) { 
print "Checking row $i \r"; #only here to prove something is happening
for $j ( 0 .. $#Array2) {
if ( $Array1[$i][0] eq $Array2[$i][0] )
{
# Found a match so populate new array.
$Array3[$i][0] = $Array1[$i][0];
$Array3[$i][1] = $Array1[$i][1] . ",". $Array2[$i][1] . "," . $j;
last;
}
} #next j
} # next i


Thanks
Alan


BillKSmith
Veteran

Feb 6, 2013, 8:44 PM

Post #2 of 6 (819 views)
Re: [alanj] Fast matching in 2 arrays [In reply to] Can't Post

You do not need the for $j loop at all. If there is a match, you create an Array3 entry with $j set to zero and exit the $j loop. If there is not a match, you repeat the test about 100k times , always getting the same result.
Good Luck,
Bill


alanj
New User

Feb 7, 2013, 1:43 AM

Post #3 of 6 (818 views)
Re: [BillKSmith] Fast matching in 2 arrays [In reply to] Can't Post

OOops, As I'm sure you spotted but were too polite to say ... I made a couple of typos in copying the code. The inner loop should of course be using the $j in Array2.

Corrected here.


Code
for $i ( 0 .. $#Array1) {  
print "Checking row $i \r"; #only here to prove something is happening
for $j ( 0 .. $#Array2) {
if ( $Array1[$i][0] eq $Array2[$j][0] )
{
# Found a match so populate new array.
$Array3[$i][0] = $Array1[$i][0];
$Array3[$i][1] = $Array1[$i][1] . ",". $Array2[$j][1] . "," . $j;
last;
}
} #next j
} # next i



BillKSmith
Veteran

Feb 7, 2013, 7:32 AM

Post #4 of 6 (812 views)
Re: [alanj] Fast matching in 2 arrays [In reply to] Can't Post

Your posted code was consistent with the X's in your text. I assumed that it was pasted into your post correctly.

Whenever you want to compare arrays think of a hash. The following code requires memory for a large hash, but should be much faster because it eliminates the inner loop.


Code
use strict; 
use warnings;
use Data::Dumper qw(Dumper);
my @Array1 = (
[1,'a'],
[2,'b'],
[3,'c'],
);
my @Array2 = (
[7,'x'],
[2,'y'],
[9,'z'],
);
my %hash2;
for my $j ( 0 .. $#Array2 ) {
$hash2{ $Array2[$j][0] } = $j;
}

my @Array3;
for my $i ( 0 .. $#Array1 ) {
next if !exists $hash2{ $Array1[$i][0] };
# Found a match so populate new array.
my $j = $hash2{ $Array1[$i][0] };
$Array3[$i][0] = $Array1[$i][0];
$Array3[$i][1] = $Array1[$i][1] . "," . $Array2[$j][1] . "," . $j;
}
print Dumper( \@Array3 );


OutPut

Code
$VAR1 = [ 
undef,
[
2,
'b,y,1'
]
];

Good Luck,
Bill


alanj
New User

Feb 8, 2013, 12:04 PM

Post #5 of 6 (802 views)
Re: [BillKSmith] Fast matching in 2 arrays [In reply to] Can't Post

Wow thank you so much. That has dropped my matching time from 31 minutes to 26 seconds. I'll need to read up more on hashes to understand why there is such a difference, as I always thought of a hash as just the keyvalue, basically. So much to learn.

Thanks again.


Laurent_R
Veteran / Moderator

Feb 9, 2013, 10:00 AM

Post #6 of 6 (795 views)
Re: [alanj] Fast matching in 2 arrays [In reply to] Can't Post

You could thing of a hash as a special type of array where the indexes are words (strings of characters) rather than numbers. Technically, it is something somewhat more complicated thanthat, but from a user standpoint that's basically what it boils down to.

In this case the fact that you can access directly to the value using the key means that you no longer need the two nested loops that you used, but just one loop. Assuming you have two arrays with 1,000 entries each, the nested loops approach leads to up to 1,000,000 queries (for each elements in array A, look all elements of array B to find a match), wheras the hash approach leads basically to 2,000 queries. This is bound to be much faster. And the more data you have, the larger the advantage of the hash approach will be.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives