CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Replacement for a slow array loop...

 



ax390
Novice

Nov 9, 2018, 7:10 AM

Post #1 of 12 (771 views)
Replacement for a slow array loop... Can't Post

I am new to perl, and I am using the code below to get the values of some tags stored in a variable.


Code
$data = "AXD:35,DN:98,JKHH:38,MJSH:100"; 

(@data) = split(/,/, $data);

foreach $line (@data) {
($tag, $value) = split(/:/, $line);

$code{$tag} = $value;

}


The problem is this routine loops through another array (hundreds of times) and the $data string contains thousands of tags. The result is a major slow down in execution time for the entire script. I also tried using a hash instead of an array, but there's no major improvement.

I there a faster way to get this data stored in the '$code{$tag}' variable? Possibly without using a loop at all?

Thank you,

Alex


FishMonger
Veteran / Moderator

Nov 9, 2018, 8:14 AM

Post #2 of 12 (765 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post


Quote
I there a faster way to get this data stored in the '$code{$tag}' variable? Possibly without using a loop at all?

Parsing a string like yours into a hash requires using some type of a loop.

What is this other loop you're referring to and how are the 2 loops connected?

You need to post a short but complete test script that demonstrates the problem. Also provide a realistic sample (between 10 to 20 lines) of the data being processed.

Based on the limited info given so far, I'd say the problem is the use of the nested loops. Sometimes nesting loops can't be avoided, but if not handled correctly can/will slow down the processing.


(This post was edited by FishMonger on Nov 9, 2018, 8:15 AM)


Zhris
Enthusiast

Nov 9, 2018, 11:34 AM

Post #3 of 12 (754 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

Fishmonger has covered your issue in detail, and further clarification is required. I'm just providing an alternative to your code above. You can assign an array directly to a hash, each consecutive pair will become the key and value.


Code
my %code = split /[:,]/, 'AXD:35,DN:98,JKHH:38,MJSH:100';


Chris


ax390
Novice

Nov 10, 2018, 12:37 AM

Post #4 of 12 (739 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

FishMonger, Chris - thank you for your replies!

I will post here the code I was referring to. I have simplified it as much as I could, but fully kept its functionality. This is a small subroutine in a much larger script for which I have been working for some time. It's working fine, except it's slow - it takes a few seconds to fully execute.

The tags I was talking about are stock market symbols. Each symbol carries a certain value, from 0 to 100.

What I need is to filter all the symbols that have a value of 70+, and then...

1) save those symbols in an array (@result);

2) count the number of occurences for each symbol, per all processed dates in the data set ($occurences{$symbol});

3) save the value associated with each symbol, per each date ($value{$symbol}{$date}).

This data is all I require from this particular subroutine.

Indeed, the problem is the nested loops, but this is the only solution I could come up with. I am sure you will be able to spot my beginner level after looking over my code. ;-)

The code below replicates the exact functionality I need, it's fully working, and I will let it do the rest of the talking...

Alex


Code
#!c:/perl/bin/perl 

$start = 2;
$end = 8;

### IMPORTANT NOTE:
### the @data array carries data for about 5 years, each line grouping about 7000 symbols;

@data = ("11/08/2018|TNDM:49.9,PACB:19.8,PTI:39.8,NIHD:1.7,AAPL:3.1,GHDX:49.7",
"11/07/2018|PACB:39.8,AAPL:68.1,CDNA:39.6,ENDP:49.6,LFVN:9.5,ECYT:59.5",
"11/06/2018|TCMD:75.0,OMCL:55.0,AAPL:89.6,MNRO:15.0,BSTC:34.9,DHT:14.9",
"11/03/2018|AWI:50.6,AGX:80.6,AAPL:38.4,AKCA:8.5,BHBK:80.5,GTLS:8.4",
"11/02/2018|TNDM:19.9,VKTX:89.8,PTI:9.8,ENDP:49.7,CDNA:9.7,AAPL:18.2",
"11/01/2018|TNDM:33.9,VCYT:39.8,LFVN:1.6,CDNA:39.6,REGI:19.5,AAPL:48.3",
"10/31/2018|QUAD:28.3,CRAI:88.2,NPO:98.2,TILE:55.1,AAPL:98.9,MCRB:78.1",
"10/30/2018|GGB:32.7,APO:42.6,BG:62.4,ITOCY:12.3,Y:92.2,BAP:82.0,AAPL:88.6",
"10/27/2018|NBGIF:3.6,TTM:8.4,OZK:20.3,AAPL:19.1,LFUGY:50.1,VIPS:90.0,BAP:82.0",
"10/26/2018|XRX:66.1,VMW:15.9,GOOGL:95.8,GOOG:85.6,MDLZ:55.5,ED:65.3,AAPL:3.1");

### FIRST LOOP
for ($n = $start; $n <= $end; $n++) {
($date, $data) = split(/\|/, @data[$n]);
(@symbols) = split(/,/, $data);

### SECOND LOOP
foreach $line (@symbols) {
($symbol, $value) = split(/:/, $line);

if ($value >= 70) {

if (!$already_pushed{$symbol}) {
push (@result, "$symbol");
$already_pushed{$symbol} = "yes";
}
$occurences{$symbol}++;
$value{$symbol}{$date} = $value;

}

}

}

print "Value >= 70: @result\n\n";

print "11/06/2018, Apple Inc. (AAPL): $occurences{'AAPL'}, $value{'AAPL'}{'11/06/2018'}\n";

sleep (10);



FishMonger
Veteran / Moderator

Nov 10, 2018, 12:40 PM

Post #5 of 12 (726 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

Why are you skipping the first 2 and the last element of that @data array?


(This post was edited by FishMonger on Nov 10, 2018, 12:42 PM)


Chris Charley
User

Nov 10, 2018, 12:57 PM

Post #6 of 12 (723 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

A few questions.

1. Can the same symbol occur more than once for a particular date? And do want only the first one seen that date with a value >= 70?

2. Or, do you only want one occurance of the symbol in all the dates with a value >= 70?

3. Should the @result array contain a symbol one time for all the dates?


BillKSmith
Veteran

Nov 10, 2018, 1:45 PM

Post #7 of 12 (716 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

When given a problem in execution speed, our first reaction is usually to fix everything that can be done easily and hope for the best. That approach seldom works. A better approach is to use a profile tool to identify the bottleneck and just fix it. When I profiled your code, I found that there is no obvious "bottleneck". No small change is likely to make much difference. I suspect that if you profile your original code, you would find that the bottleneck that needs fixing is not in the code that you posted. Assuming that I am wrong, I major redesign of this section might do the trick. (The questions that Chris and FishMonger ask suggest that this is what they are considering.) After you have new code, you can measure the improvement with a benchmark tool.
Good Luck,
Bill


ax390
Novice

Nov 10, 2018, 2:22 PM

Post #8 of 12 (714 views)
Re: [Chris Charley] Replacement for a slow array loop... [In reply to] Can't Post

I will answer below FishMonger's and Chris' questions.


Quote
Why are you skipping the first 2 and the last element of that @data array?


There are $start/$end variables because at times I may need to search only parts of the database, instead of all of it. For example, I will need to be able to run a search through last year's data, or for smaller/bigger timeframes.


Quote
Can the same symbol occur more than once for a particular date?


No, it cannot occur more than once, as each symbol comes with only one value (for a particular date).


Quote
And do want only the first one seen that date with a value >= 70? Or, do you only want one occurance of the symbol in all the dates with a value >= 70?


It is as simple as this: if there is one occurance found (for ANY date in the database), then the symbol is pushed to the @result array, just one time. That's enough for me to know that symbol hit the 70 point mark sometime in that period (the exact date is not important at all).

Then, later in the script, another subroutine will process all the symbols found in the @result array, together with their associated $occurences{$symbol} variable, and will rank them on a single scale based on the number of times they hit that 70 threshold. The more hits, the higher they will be positioned on the scale. To me that is a signal those stocks outperformed all the others (in terms of price gains) over the period of time I interrogated the database. That is precisely what I am looking for at this stage.


Quote
Should the @result array contain a symbol one time for all the dates?


Yes, just one time is enough to put it on my radar. It is the $occurences{$symbol} variable that will provide me more indepth information on each stock, like I said above.

Alex


Chris Charley
User

Nov 10, 2018, 3:08 PM

Post #9 of 12 (710 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

I think this is what you want.

Code
#!/usr/bin/perl 
use strict;
use warnings;

my $start = 2;
my $end = 8;

my @data = ("11/08/2018|TNDM:49.9,PACB:19.8,PTI:39.8,NIHD:1.7,AAPL:3.1,GHDX:49.7",
"11/07/2018|PACB:39.8,AAPL:68.1,CDNA:39.6,ENDP:49.6,LFVN:9.5,ECYT:59.5",
"11/06/2018|TCMD:75.0,OMCL:55.0,AAPL:89.6,MNRO:15.0,BSTC:34.9,DHT:14.9",
"11/03/2018|AWI:50.6,AGX:80.6,AAPL:38.4,AKCA:8.5,BHBK:80.5,GTLS:8.4",
"11/02/2018|TNDM:19.9,VKTX:89.8,PTI:9.8,ENDP:49.7,CDNA:9.7,AAPL:18.2",
"11/01/2018|TNDM:33.9,VCYT:39.8,LFVN:1.6,CDNA:39.6,REGI:19.5,AAPL:48.3",
"10/31/2018|QUAD:28.3,CRAI:88.2,NPO:98.2,TILE:55.1,AAPL:98.9,MCRB:78.1",
"10/30/2018|GGB:32.7,APO:42.6,BG:62.4,ITOCY:12.3,Y:92.2,BAP:82.0,AAPL:88.6",
"10/27/2018|NBGIF:3.6,TTM:8.4,OZK:20.3,AAPL:19.1,LFUGY:50.1,VIPS:90.0,BAP:82.0",
"10/26/2018|XRX:66.1,VMW:15.9,GOOGL:95.8,GOOG:85.6,MDLZ:55.5,ED:65.3,AAPL:3.1");


my %seen;
my @result;
my %values;

for (@data[$start .. $end]) {
my ($date, %symbols) = split /[,|:]/;

# %seen will hold the count of occurances over 70 for each stock
# %seen will only allow a stock to @temp one time
# @temp will hold no duplicates
my @temp = grep {$symbols{$_} >= 70 and not $seen{$_}++} keys %symbols;

# @result will not have any repeating stocks
# @temp gets stocks for this date that are over 70 and are unique
push @result, @temp;

for my $symbol (@temp) {
$values{$symbol}{$date} = $symbols{$symbol};
}
}

for my $stock_over_70 (sort{$seen{$b} <=> $seen{$a}} keys %seen) {
print "$stock_over_70 occured $seen{$stock_over_70} times\n";
}


Output for this sample was:

Code
AAPL occured 3 times 
BAP occured 2 times
VIPS occured 1 times
NPO occured 1 times
AGX occured 1 times
TCMD occured 1 times
MCRB occured 1 times
Y occured 1 times
VKTX occured 1 times
BHBK occured 1 times
CRAI occured 1 times



ax390
Novice

Nov 11, 2018, 12:02 AM

Post #10 of 12 (698 views)
Re: [Chris Charley] Replacement for a slow array loop... [In reply to] Can't Post

Chris, thank you for taking the time to help me out!

Your script managed to execute in 3 seconds, which is substantially lower than the 8s time I had to wait for my version.

However, the $values{$symbol}{$date} variable was supposed to be available for all the symbols and all the dates in the database. I was going to use it in case I needed to access the value of a certain stock, at a precise date in time. The good part is this is not that important at this stage since it is not needed for the actual ranking process. I am afraid that pushing all the data to say @temp2, and then looping through that array (about 7000 times) to get the $values{$symbol}{$date} variable will get us right where we were in the beginning (8 seconds execution time). If you have a handy fix for this issue, one that will not add anything more to the execution time, please let me know. If not, I will skip this, like I said.

Lastly, I ran a benchmark on your code, and most of the 3s it needs to run is taken by the split function:


Code
my ($date, %symbols) = split /[,|:]/;


Might there be a faster replacement for this line? Could you only suggest me something I should try? I would have no problem changing the whole database format if that would help import the data in a faster way. Getting the whole code to execute in under a second would be really awesome! If not, that is all right too, I thought it would not hurt asking. I have already googled for an answer, but without any success.

Thank you so much once again! Your code will allow me to spend more time outside, rather than in front of the computer screen. ;-)

Alex


Chris Charley
User

Nov 11, 2018, 9:22 AM

Post #11 of 12 (672 views)
Re: [ax390] Replacement for a slow array loop... [In reply to] Can't Post

To get all the symbols into the $results hash, just change:

for my $symbol (@temp)

to

for my $symbol (keys %symbols)

I don't think a faster method for my ($date, %symbols) = split /[,|:]/; exists. This is a basic operation.


ax390
Novice

Nov 11, 2018, 10:39 AM

Post #12 of 12 (666 views)
Re: [Chris Charley] Replacement for a slow array loop... [In reply to] Can't Post

Awesome, thanks!

Alex

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives