
Kanji
User
Sep 2, 2000, 10:25 PM
Post #2 of 16
(3208 views)
|
|
Re: Help/advice with a peice of code
[In reply to]
|
Can't Post
|
|
By default, search metacharacters in the variable you search for are honoured as is, so if you did ... <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> $phrase = "an example of metacharacters"; $match = "example of (.*)?char"; if ( $phrase =~ /$match/ ) { print $1; }</pre><HR></BLOCKQUOTE> The print statement will output "meta" as we saved it explicitly with the (.*) in the search pattern. So if you only have one half of the parens, perl barfs because it can't find the other half (hence "umatched ()"). You can disable this behavious by placing the search text inside the \Q...\E escape(ie, /\Q$searchtext[$i]\E/). See perldoc perlre for more on \Q and \E. As for an example of what dws suggested ... <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> 001 | %word = map { $_, 1 } split /\s+/, $text; 002 | open FILE, ...; 003 | while ( <FILE> ) { 004 | chomp; 005 | my ( $matched, $type ) = split; 006 | $score{ $type }++ if $word{ $matched }; 007 | } 008 | close FILE;</pre><HR></BLOCKQUOTE> Line 1 builds a hash of all the words in $text. map is very underused by people new to perl, but the code basically does the same as ... <BLOCKQUOTE><font size="1" face="Arial,Helvetica,sans serif">code:</font><HR> @text = split /\s+/, $text; foreach $t ( @text ) { $word{ $t } = 1; }</pre><HR></BLOCKQUOTE> We than open and iterate over each line of the word file (2,3), and for each line remove the trailing newline (4) and split the line up into the word we want to try match for and the category it would fit it if it does match (5). Finally, we check to see if the word was one of those searched for by checking the hash of search words (6), and if it is, we increment the counter for it's category. That has the added benefit of neatly sidestepping the problem with metacharacters in regular expression as you're comparing two strings and not searching for one string in another. :-) You can then see the numbers as $score{'i'}, $score{'f'}, and $score{'o'}, with the hash itself tying in nicely to what I showed you in your other thread. [This message has been edited by Kanji (edited 09-03-2000).]
|