Aug 8, 2015, 6:22 PM
Post #1 of 1
have written a Perl Script to handle and identify syllables in Sindhi written in Perso-Arabic Script. I need this for training eventually a converter to convert Sindhi in Perso-Arabic to Sindhi in Devanagari script.
Help in formatting a Perl script for creating a concordance of Perso-Arabic
The script invokes 2 files:
1. Syllables: A list of all the syllables.
2. Corpus: A list of words in Arabic script followed by their Devanagari equivalent, delimited by =
In each case the output is supposed to spew out
a. The syllable in question whether it is Initial Medial or Final.
b. At least 6 to 10 examples (at present only one is spewed out)
c. Bells and whistles a frequency count of all the words [not present in my script: don't know how to tailor two sets of counts]
In other words the output should be as under:
Initial ﻿: 6 EXAMPLES
Medial 6 EXAMPLES
Final 6 EXAMPLES
Standalone 6 EXAMPLES
If there are none or less, then it should specify the same.
It does work to a certain extent but the following major problems are there
1.The script should address only the Perso-Arabic side using the = delimiter and ignore the Devanagari side. It does not do that as a result of which all final occurrences are not shown. I don't know how to instruct the program to delimit analysis only to the Arabic side of the corpus and ignore the rest
2. I need at least 6-10 instances of tokens from the corpus file. At present only one is given
3. If possible the frequency.should be provided: [ I don't know how to tailor two sets of counts]
I have racked my brains over this and all attempts to get this type of output have failed.
I am attaching the script and also the data files. Could you please help me out.
Many thanks for your help
p.s. the preview shows that the text data is not shown. But the attachment provides the sample data