Large-scale news entity sentiment analysis
Ralf Steinberger
Stefanie Hegele
Hristo Tanev
Leonida Della Rocca
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
We work on detecting positive or negative sentiment towards named entities in very large volumes of news articles. The aim is to monitor changes over time, as well as to work towards media bias detection by com-paring differences across news sources and countries. With view to applying the same method to dozens of languages, we use lin-guistically light-weight methods: searching for positive and negative terms in bags of words around entity mentions (also consid-ering negation). Evaluation results are good and better than a third-party baseline sys-tem, but precision is not sufficiently high to display the results publicly in our multilin-gual news analysis system Europe Media Monitor (EMM). In this paper, we focus on describing our effort to improve the English language results by avoiding the biggest sources of errors. We also present new work on using a syntactic parser to identify safe opinion recognition rules, such as predica-tive structures in which sentiment words di-rectly refer to an entity. The precision of this method is good, but recall is very low.
Media monitoring and information extraction for the highly inflected agglutinative language Hungarian
Júlia Pajzs
Ralf Steinberger
Maud Ehrmann
Mohamed Ebrahim
Leonida Della Rocca
Stefano Bucci
Eszter Simon
Tamás Váradi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web page http://emm.newsbrief.eu/overview.html.
Acronym recognition and processing in 22 languages
Maud Ehrmann
Leonida Della Rocca
Ralf Steinberger
Hristo Tannev
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
Highly Multilingual Coreference Resolution Exploiting a Mature Entity Repository
Josef Steinberger
Jenya Belyaeva
Jonathan Crawley
Leonida Della-Rocca
Mohamed Ebrahim
Maud Ehrmann
Mijail Kabadjov
Ralf Steinberger
Erik van der Goot
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011