Aug 6, 2002, 8:27 PM
Post #1 of 5
i haven't written perl in months, and i'm just starting to pick it up again, so if i make blatant mistakes, it's probably because i have quite a few holes in my knowledge
i'm making a little prog that tells you the number of words, sentences, and paragraphs in a document. so far i have the words & paragraphs down (as well as most commonly-used words), but i can't get the sentence regex to work. what i have so far is this (with an explanation afterwards):
$count++ if (/.*?(?:\.|[!?]+)$/);
$count++ if (/.*?(?:\.|[!?]+)\s/g);
i wasn't sure if i could merge the last two items into (\s|$). my original doesn't work anyway so i wasn't able to determine if it works or not. anyway, the .*? is supposed to match everything until it comes to the first period, exclamation point, or question mark, then the (?:\.|[!\?]+) is to catch the ending period, or ! and ?. sometimes there are multiple periods in a sentence that don't necessarily denote the end of it, which is why i had to separate the punctuation marks (to allow for multiple ?s and !s but not for .s). and finally, the space is there to ensure that it doesn't catch a number with a decimal by mistake, or an acronym.
i know this is the most boring regex ever created, but it's certainly wracked my brain
any help is greatly appreciated.
(This post was edited by NuclearClam on Aug 6, 2002, 10:46 PM)