![]() ![]() More specifically, we divide the problem into two subtasks: (1) Discovering the pronunciation of new words or those words that are difficult to pronounce by mining unannotated text, much like the creation of a bilingual dictionary using the web (2) Building a decoder for the task of pronunciation prediction, for which we apply the state-of-the-art discriminative substring-based approach. problem by considering it as a simplified machine translation/transliteration task, and propose a solution that takes advantage of the recent technologies developed for machine translation and transliteration research. This is an important task for many applications including text-to-speech and text input method, and is also challenging, because Japanese kanji (ideographic) characters typically have multiple possible pronunciations. This paper addresses the problem of predicting the pronunciation of Japanese words, especially those that are newly created and therefore not in the dictionary. Some of the major design considerations in design of Anglabharti have been aimed at providing a practical aid for translation. The unresolved ambiguities are left for human post-editing. ![]() An attempt is made to resolve most of the ambiguities using ontology, syntactic & semantic tags and some pragmatic rules. We also use an example-base to identify noun and verb phrasals and resolve their semantics. Paninian framework based on Sanskrit grammar using Karak (similar to case) relationship provides an uniform way of designing the Indian language text generators. ![]() A language specific text-generator converts the 'pseudo-target' code into target language text. We exploit the similarity to a great extent in our system. Within each group the languages exhibit a high degree of structural homogeneity. A set of rules obtained through corpus analysis is used to identify plausible constituents with respect to which movement rules for the `pseudo-target' is constructed. It generates a `pseudo-target' (Pseudo-Interlingua) applicable to a group of Indian languages (target languages) such as Indo-Aryan family (Hindi, Bangla, Asamiya, Punjabi, Marathi, Oriya, Gujrati etc.), Dravidian family (Tamil, Telugu, Kannada & Malayalam) and others. AnglaMT is a pattern directed rule based system with context free grammar like structur e for English (source language). This project is aimed at the improvisation of phase I system. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2022
Categories |