Marta Recasens


2018

pdf
Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns
Kellie Webster | Marta Recasens | Vera Axelrod | Jason Baldridge
Transactions of the Association for Computational Linguistics, Volume 6

Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines that demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

2016

pdf bib
Sense Anaphoric Pronouns: Am I One?
Marta Recasens | Zhichao Hu | Olivia Rhinehart
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

2015

pdf
Resolving Discourse-Deictic Pronouns: A Two-Stage Approach to Do It
Sujay Kumar Jauhar | Raul Guerra | Edgar Gonzàlez Pellicer | Marta Recasens
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2014

pdf
An Extension of BLANC to System Mentions
Xiaoqiang Luo | Sameer Pradhan | Marta Recasens | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation
Sameer Pradhan | Xiaoqiang Luo | Marta Recasens | Eduard Hovy | Vincent Ng | Michael Strube
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf
The Life and Death of Discourse Entities: Identifying Singleton Mentions
Marta Recasens | Marie-Catherine de Marneffe | Christopher Potts
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions
Marta Recasens | Matthew Can | Daniel Jurafsky
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Linguistic Models for Analyzing and Detecting Biased Language
Marta Recasens | Cristian Danescu-Niculescu-Mizil | Dan Jurafsky
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Pierre Lison | Mattias Nilsson | Marta Recasens
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Joint Entity and Event Coreference Resolution across Documents
Heeyoung Lee | Marta Recasens | Angel Chang | Mihai Surdeanu | Dan Jurafsky
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Annotating Near-Identity from Coreference Disagreements
Marta Recasens | M. Antònia Martí | Constantin Orasan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al. (2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introducing a middle ground category to explain currently problematic cases. The first annotation effort that we describe shows that it is not possible to annotate near-identity explicitly because subjects are not fully aware of it. Therefore, our second annotation effort used an indirect method, and arrived at near-identity annotations by inference from the disagreements between five annotators who had only a two-alternative choice between coreference and non-coreference. The results show that whereas as little as 2-6% of the relations were explicitly annotated as near-identity in the former effort, up to 12-16% of the relations turned out to be near-identical following the indirect method of the latter effort.

2010

pdf
Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information
Marta Recasens | Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
A Typology of Near-Identity Relations for Coreference (NIDENT)
Marta Recasens | Eduard Hovy | M. Antònia Martí
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The task of coreference resolution requires people or systems to decide when two referring expressions refer to the 'same' entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of 'near-identity', a middle ground category between identity and non-identity, to handle such cases systematically. We present a typology of Near-Identity Relations (NIDENT) that includes fifteen types―grouped under four main families―that capture a wide range of ways in which (near-)coreference relations hold between discourse entities. We validate the theoretical model by annotating a small sample of real data and showing that inter-annotator agreement is high enough for stability (K=0.58, and up to K=0.65 and K=0.84 when leaving out one and two outliers, respectively). This work enables subsequent creation of the first internally consistent language resource of this type through larger annotation efforts.

pdf bib
SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
Marta Recasens | Lluís Màrquez | Emili Sapena | M. Antònia Martí | Mariona Taulé | Véronique Hoste | Massimo Poesio | Yannick Versley
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
Squibs: On Paraphrase and Coreference
Marta Recasens | Marta Vila
Computational Linguistics, Volume 36, Issue 4 - December 2010

2009

pdf
SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
Marta Recasens | Toni Martí | Mariona Taulé | Lluís Màrquez | Emili Sapena
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf
A Chain-starting Classifier of Definite NPs in Spanish
Marta Recasens
Proceedings of the Student Research Workshop at EACL 2009

2008

pdf
AnCora: Multilevel Annotated Corpora for Catalan and Spanish
Mariona Taulé | M. Antònia Martí | Marta Recasens
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At present AnCora is the largest multilayer annotated corpus of these languages freely available from http://clic.ub.edu/ancora. The two corpora consist mainly of newspaper texts annotated at different levels of linguistic description: morphological (PoS and lemmas), syntactic (constituents and functions), and semantic (argument structures, thematic roles, semantic verb classes, named entities, and WordNet nominal senses). All resulting layers are independent of each other, thus making easier the data management. The annotation was performed manually, semiautomatically, or fully automatically, depending on the encoded linguistic information. The development of these basic resources constituted a primary objective, since there was a lack of such resources for these languages. A second goal was the definition of a consistent methodology that can be followed in further annotations. The current versions of AnCora have been used in several international evaluation competitions