
scooper
Deleted
Oct 1, 2000, 8:27 AM
Post #4 of 8
(640 views)
|
|
Re: expensive scripts -- making them cheaper
[In reply to]
|
Can't Post
|
|
dws -- yep, again you are right. It's not recursion -- it's iteration. The sub doesn't call itself at any point. my error. The rewrite to the code in the fist paragraph makes some sense. I wasn't aware that it could be done that way, since all I've ever seen is the other way :-) This is a neat thing: <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> $searchre = "(?:" . join("|", @search_string) . ")"; </pre><HR></BLOCKQUOTE> joins terms like "foo bar" with a pipe (?:foo|bar). This matches either what preceeds or follows the pipe. instant on-the-fly extended regex! This means 'foo' OR 'bar' but in this sense I need 'foo' AND 'bar' -- 'foo' is a property of 'bar' (think HOT water as opposed to COLD water, or this is 'bar' and it's 'foo'). Could that be then revised as : <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> $searchre = "(?:" . join(" ", @search_string) . ")"; </pre><HR></BLOCKQUOTE> So then, why use 'join'? You're using it to iterate over the list @search_string !!! So then why the regex? The code matches limited keywords that are NOT user defined (OK, you didn't know this...). So in this case, matching 'food' to 'foo' couldn't happen because 'food' would never exist. So in this sense the search terms are acting like 'keys' in the database, each word refers to it's respective owner. So I strip it out and use this to iterate: <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> $searchre = join(" ", @search_string); </pre><HR></BLOCKQUOTE> Then there's the other part: <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> ($update_record =~ m/$searchre/o); # match the search string in the update record, and only compile the pattern once. </pre><HR></BLOCKQUOTE> well that breaks the sub, because if we compile it only once -- WE DON'T ITERATE OVER THE LIST. remove that. If / is the delimiter, then the initial m is OPTIONAL. remove that too? OK. That leaves us with what we had: <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> ($update_record =~ /$searchre/); </pre><HR></BLOCKQUOTE> and we have removed foreach $term(@search_string). That leaves us with: <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> sub search_database{ foreach $favorite_keywords (@favorite_keywords) { $input_string = $favorite_keywords; @search_string = split(/\s+/, $input_string); $searchre = join(" ", @search_string); open (UPDATESFILE, $updatesfile) or die "Error opening updatefile. $updatesfile: $!\n"; flock(UPDATESFILE, 1); # this is a SHARED LOCK while (<UPDATESFILE> ){ chomp($line= $_); @updatefields = split(/\t/, $line); $update_record = "$updatefields[11]"; if (!($update_record =~ /$searchre/)) { $include{$updatefields[0]} = 'false' unless exists $include{$updatefields[0]}; } else { $include{$updatefields[0]} = 'true'; } } close(UPDATESFILE); } } </pre><HR></BLOCKQUOTE> It's tighter_code++ ... check this out: I did a sort of benchmark using Time::HiRes. My code returns a search for '5_spades' in 5.311 seconds. Then I pasted our revised sub OVER my own. The same search (same database, search string, etc...) returns in believe_it_or_not .2118 seconds! Still pushes the processor up to utilization in the .50 range, but for far less actual time. I hope that makes you feel GOOD! There won't be any regex characters in the search string for reasons mentioned above :-) -- but thanks for the warning. Yes, I've been concerned with my math for quite some time now (it's pretty much always wrong), but those 11's are NOT related -- I am ignoring fields _0_thru_10_ in this search. The 11th element of the array is subdivided into 9 parts -- and each part contains a variable number of keywords. It looks like this: <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> 0\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11a|11b|11c|11d|11e|11f|11g|11h|11i|\t12\t </pre><HR></BLOCKQUOTE> The 11th element is delimited by pipes, and what is inside the pipes is CSV text. Thanks again for your help !! Now if I can only figure out how to get the processor utilization back into a realistic range. [This message has been edited by scooper (edited 10-01-2000).]
|