
solukas
New User
Nov 11, 2011, 3:58 AM
Post #1 of 6
(14471 views)
|
Regular expression of Unicode to distinguish kanji and hiragana
|
Can't Post
|
|
Dear all, I have a task to distinguish kanji and hiragana if a file has both content. This is the content of the file: The first line is hiragana and the second line is kanji, which I have to do nothing. But the last line contains both, I will just print it out. ひゅうが 通行 乗じて This is the code: open (A_FILE, "<", "kata.txt"); my(@a_lines) = <A_FILE>; # read file into list open(my $out, ">", "modified_kata.txt") or die "Can't open modified_kata.txt: $!"; foreach $a_line (@a_lines) { $sentence = $a_line; if (($sentence =~ /\p{InHiragana}/) && ($sentence =~ /\p{InCJKUnifiedIdeographs}/)){ print $out $sentence . "\n"; } } It seems like that the perl cannot recognise the function /\p{}/, the result is still wrong if I put use utf8; on top. Do you have any suggestions? I am a newbie in handling the unicode. Thanks very much for your help!! Kind regards, Luke
|