CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Advice with Function Call From Loop

 



JADobson
New User

Nov 19, 2010, 7:55 AM

Post #1 of 2 (305 views)
Advice with Function Call From Loop Can't Post

Hey,

Firstly, a bit of info on what I'm trying to do.

I'm attempting to write a script which finds words from several lists, one with , over 20,000 entries, in a string (I'll be honest from the start - this is for a university project, so I'm not looking for answers, just pointers and advice). My current approach (Probably not very efficient, but it works) is checking if the substring of the list entry is in the original string.

So, say I have the list (Purely an example):

black car
white car
green car
red car
...

And the string:

"my friend drives a bright red car"

It would attempt to find the substring "black car", then the substring "white car", until it gets to "red car" which is a match.

(If anybody has any suggestions as to a different more efficient approach to this lookup, please let me know)

Anyway, that all works fine, however I'm required to strip all punctuation and format both the original string and list entries in a certain way before attempting to find an entry in the original string.

Originally I had something like:


Code
while (my $crLine = <CRFILE>)  
{
$crLine = &formatString($crLine);
if($OriginalString =~ m/$crLine/i){ ...


After reading up on function calls in Perl, it turns out that (Calling a function from within a loop 20,000+ times) is a terrible idea where optimization is concerned.

I'm struggling to find an alternative approach, without writing duplicating the code within the formatString sub routine 4 or so times. Maybe I'm just being braindead.

I thought about writing a sub routine which took a file handler or reference to a FH as a parameter, then doing the formatting and lookup within that function, but the lookup for each list is different (Some are looking for matches, some replacing text and others simply removing text).

Any ideas?

The formatString sub routine basically performs a bunch of regex operations on the passed string:


Code
	$cwLine =~ tr/\-/ /; 
$cwLine =~ tr/A-Z/a-z/;
$cwLine =~ s/[^a-zA-Z\s]|\s+$|^\s+//g;
$cwLine =~ s/$/ /;
$cwLine =~ s/^/ /;


Optimization is my biggest requirement here. If you see anything inefficient with what I've said, I'd greatly appreciate some pointers.

Execution wise, the formatString sub routine approach performs the lookup on a string against 3 lists (With a total of round about 30,000 entries) in approximately 4.5 seconds.

Removing the sub routine and duplicating the contained code takes that time down to around 1.2 seconds.

Am I going to have to deal with duplicated code for the sake of optimization or is there an alternative approach I could consider?

Thanks for your time,

James


FishMonger
Veteran / Moderator

Nov 19, 2010, 6:32 PM

Post #2 of 2 (293 views)
Re: [JADobson] Advice with Function Call From Loop [In reply to] Can't Post

Don't use & when calling a sub unless you know about and what its side effects.

Don't use bareword filehandles. Use a lexical var for the filehandle.

Pass a reference to the sub and the lines of it that you've posted could use some rework.

There are several cpan modules that are very helpful in profiling and benchmarking Perl code.

http://search.cpan.org/~jesse/perl-5.12.2/ext/Devel-DProf/DProf.pm

http://search.cpan.org/~jesse/perl-5.12.2/lib/Benchmark.pm

Your formatString sub could use some work.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives