Johann Haller


Sentiment Analysis for Issues Monitoring Using Linguistic Resources
Ecaterina Rascu | Kai Schirmer | Johann Haller
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Sentiment analysis dealing with the identification and evaluation of opinions towards a topic, a company, or a product is an essential task within media analysis. It is used to study trends, determine the level of customer satisfaction, or warn immediately when unfavourable trends risk damaging the image of a company. In this paper we present an issues monitoring system which, besides text categorization, also performs an extensive sentiment analysis of online news and newsgroup postings. Input texts undergo a morpho-syntactic analysis, are indexed using a thesaurus and are categorized into user-specific classes. During sentiment analysis, sentiment expressions are identified and subsequently associated with the established topics. After presenting the various components of the system and the linguistic resources used, we describe in detail SentA, its sentiment analysis component, and evaluate its performance.


Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts
Michael Carl | Ecaterina Rascu | Johann Haller
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Controlling Gender Equality with Shallow NLP Techniques
Michael Carl | Sandrine Garnier | Johann Haller | Anne Altmayer | Bärbel Miemietz
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics


Application of corpus-based techniques to Amharic texts
Sisay Fissaha | Johann Haller
Workshop on Machine Translation for Semitic languages: issues and approaches

A number of corpus-based techniques have been used in the development of natural language processing application. One area in which these techniques have extensively been applied is lexical development. The current work is being undertaken in the context of a machine translation project in which lexical development activities constitute a significant portion of the overall task. In the first part, we applied corpus-based techniques to the extraction of collocations from Amharic text corpus. Analysis of the output reveals important collocations that can usefully be incorporated in the lexicon. This is especially true for the extraction of idiomatic expressions. The patterns of idiom formation which are observed in a small manually collected data enabled extraction of large set of idioms which otherwise may be difficult or impossible to recognize. Furthermore, preliminary results of other corpus-based techniques, that is, clustering and classification, that are currently being under investigation are presented. The results show that clustering performed no better than the frequency base line whereas classification showed a clear performance improvement over the frequency base line. This in turn suggests the need to carry out further experiments using large sets of data and more contextual information.


pdf bib
Multilint - a Technical Documentation System with Multilingual Intelligence
Johann Haller
Proceedings of Translating and the Computer 18


Machine translation, ten years on: Discourse has yet to make a breakthrough
Ruslan Mitkov | Johann Haller
Proceedings of the Second International Conference on Machine Translation: Ten years on

Progress in Machine Translation (MT) during the last ten years has been observed at different levels, but discourse has yet to make a breakthrough. MT research and development has concentrated so far mostly on sentence translation (discourse analysis being a very complicated task) and the successful operation of most of the working MT systems does not usually go beyond the sentence level. To start with, the paper will refer to the MT research and development in the last ten years at the IAI in Saarbrücken. Next, the MT discourse issues will be discussed both from the point of view of source language analysis and target text generation, and on the basis of the preliminary results of an ongoing "discourse-oriented MT" project . Probably the most important aspect in successfully analysing multisentential source texts is the capacity to establish the anaphoric references to preceding discourse entities. The paper will discuss the problem of anaphora resolution from the perspective of MT. A new integrated model for anaphora resolution, developed for the needs of MT, will be also outlined. As already mentioned, most machine translation systems perform translation sentence by sentence. But even in the case of paragraph translation, the discourse structure of the target text tends to be identical to that of the source text. However, the sublanguage discourse structures may differ across the different languages, and thus a translated text which assumes the same discourse structure as the source text may sound unnatural and perhaps disguise the true intent of the writer. Finally, the paper will outline a new approach for generating discourse structures, appropriate to the target sublanguage and will discuss some of the complicated problems encountered.