CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Multiples of 10

 



IsabelleFr
Novice

Feb 16, 2013, 8:56 AM

Post #1 of 6 (620 views)
Multiples of 10 Can't Post

Hi :) I'm trying to make a script that allows me to 'join' complex numbers like 'one hundred thousand two hundred fifty dogs' to 100250. I have a script that allows me to transform letters to numbers, but the problem is that it takes a number like 'two hundred' and makes it '2 100'. You see the problem... I would like, after making it from letters to numbers, to be able to join it... i'll post what i tried:

Code
use strict;  
use warnings;
my %norm = (
'un' => 1,
'une' => 1,
'deux' => 2,
'trois' => 3,
'quatre' => 4,
'cinq' => 5,
'six' => 6,
'sept' => 7,
'huit' => 8,
'neuf' => 9,
'dix' => 10,
'onze' => 11,
'douze' => 12,
'treize' => 13,
'quatorze' => 14,
'quinze' => 15,
'seize' => 16,
'dix-sept' => 17,
'dix-huit' => 18,
'dix-neuf' => 19,
'vingt' => 20,
'trente' => 30,
'quarante' => 40,
'cinquante' => 50,
'soixante' => 60,
'soixante-dix' => 70,
'quatre-vingt' => 80,
'quatre-vingt-dix' => 90,
'cent' => 100,
'cents' => 100,
'mille' => 1000,
'million' => 1000000,
'millions' => 1000000,
'milliards' => 1000000000,
);

while (my $ligne = <STDIN>) {
chomp $ligne;

if ($ligne =~ /^(\d+?)\s*%.*/ig){
my $multi = $1 / 100;
print $multi. "\n";
}
if ($ligne =~ s/(:?une?|deux|trois|quatre|cinq|six|sept|huit|neuf|dix|onze|douze|treize|quatorze|quinze|seize|dix-sept|dix-huit|dix-neuf|vingt|trente|quarante|cinquante|soixante|soixante-dix|quatre-vingt|quatre-vingt-dix|cents?|mille|millions?|milliards?)/$norm{lc $1}/egi){
print $ligne, "\n";
}
else {
print $ligne."\n";
}
}

(u will notice its in french because i'm french lol ;P)
It is not perfect because to not skip lines where i already have numbers, i put a print $ligne, but ofc it prints twice a line when it's a X%. I need help =/


Laurent_R
Enthusiast / Moderator

Feb 16, 2013, 11:03 AM

Post #2 of 6 (609 views)
Re: [IsabelleFr] Multiples of 10 [In reply to] Can't Post

Hi Isabelle,

to start with, you should use the /x modifier in your regular expression, as this allows you to add spaces and new lines without changing the meaning, so that you could rewrite your very long line as something much easier to read like:


Code
if ($ligne =~ s/(:?une? | 
deux |
trois |
quatre |
# ... etc. (you can even insert comments)
milliards?)
/ $norm{lc $1}
/ egix) {


Second, using regexes is not the right way to to something like this. You'll probably need tu use a real general purpose parser, to write a grammar describing how number names are constructed to be able to translate them into numbers.

As a starting point, you could have a loot at Damian Conway's generic parser, Parse::RecDescent on tyhe CPAN.

If you insist ondoing it yourself with regexes, you'll need to add yourself a lot of logics on how the various pieces of a number interact in French. Numbers are not very complex, so it can probably be done, but I think it would be wrong to do it this way, unless it is is homework for school.


BillKSmith
Veteran

Feb 16, 2013, 12:16 PM

Post #3 of 6 (604 views)
Re: [IsabelleFr] Multiples of 10 [In reply to] Can't Post

I agree with Laurent that this is probably a more dificult task than you think it it. If this is a real world task, you probably must handle idiomatic names for some numbers. (e.g. In English, a person born in the previous century says that he was born in the year "nineteen eighty six" rather than the more formal "one thousand nine hundred eighty six".) I do not know if this is an issue at all in French, but it does seem likely.

If you must do your own parsing, you will find it much easier to work from from right to left. Do the units first, then the tens then hundreds, etc. Just leave a zero in any position which is not mentioned.
Good Luck,
Bill


Laurent_R
Enthusiast / Moderator

Feb 16, 2013, 2:03 PM

Post #4 of 6 (602 views)
Re: [BillKSmith] Multiples of 10 [In reply to] Can't Post


In Reply To
(e.g. In English, a person born in the previous century says that he was born in the year "nineteen eighty six" rather than the more formal "one thousand nine hundred eighty six".) I do not know if this is an issue at all in French, but it does seem likely.


This is exactly one typical problem in French also. For example, I was born in 1955, this can be written "mille neuf cent cinquante-cinq" (litterally "one thousand nine hundred...) or "dix-neuf-cent cinquante-cinq" (nineteen hundred...) .

Another additional problem is that, depending on various things, you are going to insert hyphens or spaces between various words. Although there are some rules on this, there are too many people not following them exactly, you'll need to be able to recognize numbers with or without hyphens.

And so on and so forth.

Regarding Bill's suggestion, I haven't really tried it, but I must say that the idea or working from right to left seems to be a very good idea.

BTW, Isabelle, just in case it is not obvious at this point, French is my mother-tongue too (or native language, I am not sure on how best to say it), don't hesitate to ask me if you need specific help on French-specific issues, I am ready to help you if I can.


IsabelleFr
Novice

Feb 17, 2013, 10:45 AM

Post #5 of 6 (585 views)
Re: [Laurent_R] Multiples of 10 [In reply to] Can't Post

Ty Laurent! That \x modifier is a good idea :) You are right about doing this with regex, but since i'm a beginner i'm still learning how to use perl. I'm trying new things with regex, i know it's not gonna be perfect but i expect to have some not so bad results. Et merci encore ;P


rovf
Veteran

Feb 22, 2013, 7:15 AM

Post #6 of 6 (541 views)
Re: [IsabelleFr] Multiples of 10 [In reply to] Can't Post

Just a note on the side: In your regexp


Code
/^(\d+?)\s*%.*/


the final .* is redundant, so you can always write it as


Code
/^(\d+?)\s*%/


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives