CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Advanced: How to speed up search on array?: Edit Log



bulrush
User

Aug 30, 2016, 3:49 AM


Views: 11040
How to speed up search on array?

I have a program where I have to search through a huge array with 3,000,000 items and return every matching item in @prkeys. Each time the program runs I'm running this search routine 3000+ times. It can take me 90+ minutes to run the whole program. Here's the code that is slowing me down.

Code
# Regex will be: MODEL.+OLDPRICE 
$k=$basemodel.'.+'.$oldprice;
# Find all models that start with key and end with price.
@k=grep(/^$k/, @prkeys);

The intention of this code is to make a fuzzy match looking for strings that begin with a model number and end with an price. This is my last resort to find a match on a model, and it must be a fuzzy match of some type, although I could split the search into 2 parts, but it seems both parts would be a regex and splitting it into 2 parts would slow me down even more.

  1. @prkeys is an array that contains 3,000,000+ items.
  2. @prkeys is all in memory.
  3. Each string in @prkeys is 5-30 characters long.
  4. I must return every item that matches in @prkeys.
  5. Each time I search for $k, $k starts with a model number, and ends with a price. So the regex looks like: <code>/MODEL.+PRICE/</code>
  6. Because of the bad data the customer gives us I do have to use this search method of searching 3,000,000 strings.
  7. This OS is a virtual machine and there are other VMs on this physical server, and I suspect the other VMs are also slowing me down. I cannot move the VM to another physical server, so I must address speed in the code itself.
  8. I have a test program to test read speed but I have no other ideas how to speed this up. Speed normally is not an issue for me.

Questions

  1. How can I speed this up? Each time this one line runs it takes about 2 seconds. That's 6000 seconds just for this one line only, not counting any other overhead and processing for the rest of the program.
  2. Will I have to use another data structure to search for all this data?

Thank you for your help. I normally don't have to do such searching on a huge dataset.

I will post a link to the huge file and a test program for you to use shortly.

Here's the link to the data file and test program. It's about 800mb. https://www.dropbox.com/s/x4qrsmy06zdcc57/bigdata.zip?dl=0
-----


(This post was edited by bulrush on Aug 30, 2016, 10:07 AM)


Edit Log:
Post edited by bulrush (User) on Aug 30, 2016, 4:06 AM
Post edited by bulrush (User) on Aug 30, 2016, 4:08 AM
Post edited by bulrush (User) on Aug 30, 2016, 4:33 AM
Post edited by bulrush (User) on Aug 30, 2016, 4:35 AM
Post edited by bulrush (User) on Aug 30, 2016, 5:27 AM
Post edited by bulrush (User) on Aug 30, 2016, 10:07 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives