Jun 17, 2001, 2:31 PM
Post #5 of 6
Re: How to match words in a string/sentence??
[In reply to]
If you'd like to call a CPU usage increased by 28% an overkill for such a tiny thing, I agree. I would have been surprised if the regex would need so much longer. To the best of my knowledge, this regex does nearly the same internally as the index() functions does, except for compilation. Anyway, I've benchmarked our two code snippets:
My regexes can run 74000 times per second, while your index() can do 95000 runs. Your solution is definetely faster. But, if you run the code with use re 'debug' you will notice that the compiled regex searches for an exact three character match 'bar' / 'foo' and nothing else:
Benchmark: timing 2000000 iterations of cure, mhx...
cure: 21 wallclock secs (21.04 usr + 0.00 sys = 21.04 CPU) @ 95057.03/s (n=2000000)
mhx: 27 wallclock secs (26.97 usr + 0.00 sys = 26.97 CPU) @ 74156.47/s (n=2000000)
There's of course some overhead because this matching is embedded in the regex engine. But Perl's regex engine is quite fast.
Compiling REx `foo'
size 3 first at 1
1: EXACT <foo>(3)
anchored `foo' at 0 (checking anchored isall) minlen 3
Compiling REx `bar'
size 3 first at 1
1: EXACT <bar>(3)
anchored `bar' at 0 (checking anchored isall) minlen 3
If the code would be used in a loop repeating several thousand times or would be embedded in a big project, I would absolutely favour your solution. But in a script as tiny as the one we're discussing here, I think the regex solution is more readable and more intuitive (at least to me ).
I don't want to criticize here, your solution is right and faster than mine (and I'm sure I'll remember it and use it if appropriate). I just wanted to point out that using the regex here is not as bad as you said. Using .* is more of an overkill in most of its applications...