Sisay Fissaha Adafre

Also published as: Sisay Fissaha, Sisay Fissaha Adafre


2006

pdf
Finding Similar Sentences across Multiple Languages in Wikipedia
Sisay Fissaha Adafre | Maarten de Rijke
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources

2005

pdf bib
Feature Engineering and Post-Processing for Temporal Expression Recognition Using Conditional Random Fields
Sisay Fissaha Adafre | Maarten de Rijke
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing

pdf
Part of Speech Tagging for Amharic using Conditional Random Fields
Sisay Fissaha Adafre
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

2004

pdf
Formal analysis of some aspects of Amharic noun phrases
Sisay Fissaha Adafre
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf
The University of Amsterdam at Senseval-3: Semantic roles and Logic forms
David Ahn | Sisay Fissaha | Valentin Jijkoun | Maarten De Rijke
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

pdf
Phrase-based Evaluation of Word-to-Word Alignments
Michael Carl | Sisay Fissaha
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf
Application of corpus-based techniques to Amharic texts
Sisay Fissaha | Johann Haller
Workshop on Machine Translation for Semitic languages: issues and approaches

A number of corpus-based techniques have been used in the development of natural language processing application. One area in which these techniques have extensively been applied is lexical development. The current work is being undertaken in the context of a machine translation project in which lexical development activities constitute a significant portion of the overall task. In the first part, we applied corpus-based techniques to the extraction of collocations from Amharic text corpus. Analysis of the output reveals important collocations that can usefully be incorporated in the lexicon. This is especially true for the extraction of idiomatic expressions. The patterns of idiom formation which are observed in a small manually collected data enabled extraction of large set of idioms which otherwise may be difficult or impossible to recognize. Furthermore, preliminary results of other corpus-based techniques, that is, clustering and classification, that are currently being under investigation are presented. The results show that clustering performed no better than the frequency base line whereas classification showed a clear performance improvement over the frequency base line. This in turn suggests the need to carry out further experiments using large sets of data and more contextual information.