Extending LOGS ontologizer to multilingual environments

Abstract

Ontologies greatly enhance our ability to machine process digital documents and aid in knowledge engineering. They also augment search algorithms. LOGS, Lightweight universal Ontology Generation and exploitation architectureS, and its sample application, the Eagle, took a step toward automatic ontology generation, through a fast, lightweight approach. Sanskritology and Indology are interesting domains where most of modern research is carried out in the West, often by transliterating Sanskrit texts into Roman scripts with many scholars relying only on definitive translations of these texts for their study. However, as most of the Scriptural texts were preserved by oral traditions, through the patrilineal system, a number of recensions exist for various texts,- some complete and some not; only a few of these were ever translated. In addition, a number of these books repeat portions of each other. Texts could be presented in native or exploded forms. The study itself is highly interpretive. Therefore, the validation, ontoligizing and rendition through intelligent interfaces, of various recensions of a particular book, present a very rich problem for methods such as LOGS. We describe the use of LOGS in an ontological server, which generates and maintains ontologies of Sanskrit texts and demonstrate its use on versions of the Sama Veda, one of the complete Hindu Scriptures. We need to process natural language in Sanskrit. To show how we accomplish this NLP, we describe the heuristic algorithms of the Vyakarana API we have developed, which is the first known Sanskrit grammar and transliteration tool for this area. As its grammar evolved with Sanskrit over centuries, Vyakarana is by design adaptive to its subject. Vyakarana is used in the ontology generation step of LOGS and is also used in intelligent domain queries such as generating concordances, dictionary lookup of compound words and in dynamic transliterations. Finally, we comment on how such an ontolgized book could be easily integrated into and cross-referenced in a searchable digital repository and the implications of a successful utilization of LOGS in Sanskritology to other domains.


Paper (under revision)