Leonida Della Rocca

Also published as: Leonida Della Rocca, Leonida Della-Rocca


2017

pdf
Large-scale news entity sentiment analysis
Ralf Steinberger | Stefanie Hegele | Hristo Tanev | Leonida Della Rocca
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We work on detecting positive or negative sentiment towards named entities in very large volumes of news articles. The aim is to monitor changes over time, as well as to work towards media bias detection by com-paring differences across news sources and countries. With view to applying the same method to dozens of languages, we use lin-guistically light-weight methods: searching for positive and negative terms in bags of words around entity mentions (also consid-ering negation). Evaluation results are good and better than a third-party baseline sys-tem, but precision is not sufficiently high to display the results publicly in our multilin-gual news analysis system Europe Media Monitor (EMM). In this paper, we focus on describing our effort to improve the English language results by avoiding the biggest sources of errors. We also present new work on using a syntactic parser to identify safe opinion recognition rules, such as predica-tive structures in which sentiment words di-rectly refer to an entity. The precision of this method is good, but recall is very low.

2014

pdf
Media monitoring and information extraction for the highly inflected agglutinative language Hungarian
Júlia Pajzs | Ralf Steinberger | Maud Ehrmann | Mohamed Ebrahim | Leonida Della Rocca | Stefano Bucci | Eszter Simon | Tamás Váradi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web page http://emm.newsbrief.eu/overview.html.

2013

pdf
Acronym recognition and processing in 22 languages
Maud Ehrmann | Leonida Della Rocca | Ralf Steinberger | Hristo Tannev
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2011

pdf
Highly Multilingual Coreference Resolution Exploiting a Mature Entity Repository
Josef Steinberger | Jenya Belyaeva | Jonathan Crawley | Leonida Della-Rocca | Mohamed Ebrahim | Maud Ehrmann | Mijail Kabadjov | Ralf Steinberger | Erik van der Goot
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011