2018
pdf
bib
abs
Anaphora Resolution with the ARRAU Corpus
Massimo Poesio
|
Yulia Grishina
|
Varada Kolhatkar
|
Nafise Moosavi
|
Ina Roesiger
|
Adam Roussel
|
Fabian Simonjetz
|
Alexandra Uma
|
Olga Uryupina
|
Juntao Yu
|
Heike Zinsmeister
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference
The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.
pdf
abs
Supervised Clustering of Questions into Intents for Dialog System Applications
Iryna Haponchyk
|
Antonio Uva
|
Seunghak Yu
|
Olga Uryupina
|
Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Modern automated dialog systems require complex dialog managers able to deal with user intent triggered by high-level semantic questions. In this paper, we propose a model for automatically clustering questions into user intents to help the design tasks. Since questions are short texts, uncovering their semantics to group them together can be very challenging. We approach the problem by using powerful semantic classifiers from question duplicate/matching research along with a novel idea of supervised clustering methods based on structured output. We test our approach on two intent clustering corpora, showing an impressive improvement over previous methods for two languages/domains.
2017
pdf
abs
Collaborative Partitioning for Coreference Resolution
Olga Uryupina
|
Alessandro Moschitti
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
This paper presents a collaborative partitioning algorithm—a novel ensemble-based approach to coreference resolution. Starting from the all-singleton partition, we search for a solution close to the ensemble’s outputs in terms of a task-specific similarity measure. Our approach assumes a loose integration of individual components of the ensemble and can therefore combine arbitrary coreference resolvers, regardless of their models. Our experiments on the CoNLL dataset show that collaborative partitioning yields results superior to those attained by the individual components, for ensembles of both strong and weak systems. Moreover, by applying the collaborative partitioning algorithm on top of three state-of-the-art resolvers, we obtain the best coreference performance reported so far in the literature (MELA v08 score of 64.47).
2016
pdf
abs
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
Olga Uryupina
|
Ron Artstein
|
Antonella Bristot
|
Federica Cavicchio
|
Kepa Rodriguez
|
Massimo Poesio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.
pdf
LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform
Olga Uryupina
|
Barbara Plank
|
Gianni Barlacchi
|
Francisco J. Valverde Albacete
|
Manos Tsagkias
|
Antonio Uva
|
Alessandro Moschitti
Proceedings of ACL-2016 System Demonstrations
2015
pdf
A State-of-the-Art Mention-Pair Model for Coreference Resolution
Olga Uryupina
|
Alessandro Moschitti
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics
2014
pdf
abs
SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
Olga Uryupina
|
Barbara Plank
|
Aliaksei Severyn
|
Agata Rotondi
|
Alessandro Moschitti
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper we present SenTube – a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details.
pdf
Opinion Mining on YouTube
Aliaksei Severyn
|
Alessandro Moschitti
|
Olga Uryupina
|
Barbara Plank
|
Katja Filippova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2013
pdf
Towards Robust Linguistic Analysis using OntoNotes
Sameer Pradhan
|
Alessandro Moschitti
|
Nianwen Xue
|
Hwee Tou Ng
|
Anders Björkelund
|
Olga Uryupina
|
Yuchen Zhang
|
Zhi Zhong
Proceedings of the Seventeenth Conference on Computational Natural Language Learning
pdf
Multilingual Mention Detection for Coreference Resolution
Olga Uryupina
|
Alessandro Moschitti
Proceedings of the Sixth International Joint Conference on Natural Language Processing
pdf
Adapting a State-of-the-art Anaphora Resolution System for Resource-poor Language
Utpal Sikdar
|
Asif Ekbal
|
Sriparna Saha
|
Olga Uryupina
|
Massimo Poesio
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
Sameer Pradhan
|
Alessandro Moschitti
|
Nianwen Xue
|
Olga Uryupina
|
Yuchen Zhang
Joint Conference on EMNLP and CoNLL - Shared Task
pdf
BART goes multilingual: The UniTN / Essex submission to the CoNLL-2012 Shared Task
Olga Uryupina
|
Alessandro Moschitti
|
Massimo Poesio
Joint Conference on EMNLP and CoNLL - Shared Task
pdf
abs
Domain-specific vs. Uniform Modeling for Coreference Resolution
Olga Uryupina
|
Massimo Poesio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Several corpora annotated for coreference have been made available in the past decade. These resources differ with respect to their size and the underlying structure: the number of domains and their similarity. Our study compares domain-specific models, learned from small heterogeneous subsets of the investigated corpora, against uniform models, that utilize all the available data. We show that for knowledge-poor baseline systems, domain-specific and uniform modeling yield same results. Systems, relying on large amounts of linguistic knowledge, however, exhibit differences in their performance: with all the designed features in use, domain-specific models suffer from over-fitting, whereas with pre-selected feature sets they tend to outperform union models.
2011
pdf
Multi-metric optimization for coreference: The UniTN / IITP / Essex submission to the 2011 CONLL Shared Task
Olga Uryupina
|
Sriparna Saha
|
Asif Ekbal
|
Massimo Poesio
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
pdf
Single and multi-objective optimization for feature selection in anaphora resolution
Sriparna Saha
|
Asif Ekbal
|
Olga Uryupina
|
Massimo Poesio
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
pdf
Corry: A System for Coreference Resolution
Olga Uryupina
Proceedings of the 5th International Workshop on Semantic Evaluation
pdf
BART: A Multilingual Anaphora Resolution System
Samuel Broscheit
|
Massimo Poesio
|
Simone Paolo Ponzetto
|
Kepa Joseba Rodriguez
|
Lorenza Romano
|
Olga Uryupina
|
Yannick Versley
|
Roberto Zanoli
Proceedings of the 5th International Workshop on Semantic Evaluation
pdf
abs
Creating a Coreference Resolution System for Italian
Massimo Poesio
|
Olga Uryupina
|
Yannick Versley
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper summarizes our work on creating a full-scale coreference resolution (CR) system for Italian, using BART ― an open-source modular CR toolkit initially developed for English corpora. We discuss our experiments on language-specific issues of the task. As our evaluation experiments show, a language-agnostic system (designed primarily for English) can achieve a performance level in high forties (MUC F-score) when re-trained and tested on a new language, at least on gold mention boundaries. Compared to this level, we can improve our F-score by around 10% introducing a small number of language-specific changes. This shows that, with a modular coreference resolution platform, such as BART, one can straightforwardly develop a family of robust and reliable systems for various languages. We hope that our experiments will encourage researchers working on coreference in other languages to create their own full-scale coreference resolution systems ― as we have mentioned above, at the moment such modules exist only for very few languages other than English.
2008
pdf
abs
Error Analysis for Learning-based Coreference Resolution
Olga Uryupina
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
State-of-the-art coreference resolution engines show similar performance figures (low sixties on the MUC-7 data). Our system with a rich linguistically motivated feature set yields significantly better performance values for a variety of machine learners, but still leaves substantial room for improvement. In this paper we address a relatively unexplored area of coreference resolution - we present a detailed error analysis in order to understand the issues raised by corpus-based approaches to coreference resolution.
2006
pdf
abs
Coreference Resolution with and without Linguistic Knowledge
Olga Uryupina
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
State-of-the-art statistical approaches to the Coreference Resolution task rely on sophisticated modeling, but very few (10-20) simple features. In this paper we propose to extend the standard feature set substantially, incorporating more linguistic knowledge. To investigate the usability of linguistically motivated features, we evaluate our system for a variety of machine learners on the standard dataset (MUC-7) with the traditional learning set-up.
2004
pdf
Discourse-New Detectors for Definite Description Resolution: A Survey and a Preliminary Proposal
Massimo Poesio
|
Olga Uryupina
|
Renata Vieira
|
Mijail Alexandrov-Kabadjov
|
Rodrigo Goulart
Proceedings of the Conference on Reference Resolution and Its Applications
pdf
Evaluating Name-Matching for Coreference Resolution
Olga Uryupina
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2003
pdf
High-precision Identification of Discourse New and Unique Noun Phrases
Olga Uryupina
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics
pdf
Semi-supervised learning of geographical gazetteer from the internet
Olga Uryupina
Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References