Winston N Anderson
Also published as: Winston Anderson
2010
Base Concepts in the African Languages Compared to Upper Ontologies and the WordNet Top Ontology
Winston Anderson
|
Laurette Pretorius
|
Albert Kotzé
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Ontologies, and in particular upper ontologies, are foundational to the establishment of the Semantic Web. Upper ontologies are used as equivalence formalisms between domain specific ontologies. Multilingualism brings one of the key challenges to the development of these ontologies. Fundamental to the challenges of defining upper ontologies is the assumption that concepts are universally shared. The approach to developing linguistic ontologies aligned to upper ontologies, particularly in the non-Indo-European language families, has highlighted these challenges. Previously two approaches to developing new linguistic ontologies and the influence of these approaches on the upper ontologies have been well documented. These approaches are examined in a unique new context: the African, and in particular, the Bantu languages. In particular, we address the following two questions: Which approach is better for the alignment of the African languages to upper ontologies? Can the concepts that are linguistically shared amongst the African languages be aligned easily with upper ontology concepts claimed to be universally shared?
2006
Finite state tokenisation of an orthographical disjunctive agglutinative language: The verbal segment of Northern Sotho
Winston N Anderson
|
Petronella M Kotzé
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Tokenisation is an important first pre-processing step required to adequately test finite-state morphological analysers. In agglutinative languages each morpheme is concatinatively added on to form a complete morphological structure. Disjunctive agglutinative languages like Northern Sotho write these morphemes, for certain morphological categories only, as separate words separated by spaces or line breaks. These breaks are, by their nature, different from breaks that separate words that are written conjunctively. A tokeniser is required to isolate categories, like a verb, from raw text before they can be correctly morphologically analysed. The authors have successfully produced a finite state tokeniser for Northern Sotho, where verb segments are written disjunctively but nominal segments conjunctively. The authors show that since reduplication in the Northern Sotho language does not affect the pre-processing tokeniser, the disjunctive standard verbal segment as a construct in Northern Sotho is deterministic, finite-state and a regular Type 0 language in the Chomsky hierarchy and that the copulative verbal segment, due to its semi-disjunctivism, is ambiguously non-deterministic.