Roser Morante

2024

pdf abs
The Kronieken Corpus: an Annotated Collection of Dutch/Flemish Chronicles from 1500-1850
Theo Dekker | Erika Kuijpers | Alie Lassche | Carolina Lenarduzzi | Roser Morante | Judith Pollmann
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

In this paper we present the Kronieken Corpus, a new digital collection of 204 chronicles written in Dutch/Flemish between 1500 and 1850, which have been scanned, transcribed and annotated with named entities, dates, pages and a smaller part with sources and attributions. The texts belong to 308 physical volumes and contain between 23 and 24 million words. 107 chronicles, or 178 chronicle volumes, collected from 39 different archives and libraries in The Netherlands and Belgium and transcribed by volunteers had never been transcribed or published before. The result is a unique enriched historical text corpus of original hand-written, non-canonical and non-fiction text by lay people from the early modern period.

pdf abs
A Web Portal about the State of the Art of NLP Tasks in Spanish
Enrique Amigó | Jorge Carrillo-de-Albornoz | Andrés Fernández | Julio Gonzalo | Guillermo Marco | Roser Morante | Laura Plaza | Jacobo Pedrosa
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a new web portal with information about the state of the art of natural language processing tasks in Spanish. It provides information about forums, competitions, tasks and datasets in Spanish, that would otherwise be spread in multiple articles and web sites. The portal consists of overview pages where information can be searched for and filtered by several criteria and individual pages with detailed information and hyperlinks to facilitate navigation. Information has been manually curated from publications that describe competitions and NLP tasks from 2013 until 2023 and will be updated as new tasks appear. A total of 185 tasks and 128 datasets from 94 competitions have been introduced.

2022

pdf abs
Identifying Copied Fragments in a 18th Century Dutch Chronicle
Roser Morante | Eleanor L. T. Smith | Lianne Wilhelmus | Alie Lassche | Erika Kuijpers
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We apply computational stylometric techniques to an 18th century Dutch chronicle to determine which fragments of the manuscript represent the author’s own original work and which show signs of external source use through either direct copying or paraphrasing. Through stylometric methods the majority of text fragments in the chronicle can be correctly labelled as either the author’s own words, direct copies from sources or paraphrasing. Our results show that clustering text fragments based on stylometric measures is an effective methodology for authorship verification of this document; however, this approach is less effective when personal writing style is masked by author independent styles or when applied to paraphrased text.

pdf abs
Leveraging Social Media as a Source for Clinical Guidelines: A Demarcation of Experiential Knowledge
Jia-Zhen Michelle Chan | Florian Kunneman | Roser Morante | Lea Lösch | Teun Zuiderent-Jerak
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

In this paper we present a procedure to extract posts that contain experiential knowledge from Facebook discussions in Dutch, using automated filtering, manual annotations and machine learning. We define guidelines to annotate experiential knowledge and test them on a subset of the data. After several rounds of (re-)annotations, we come to an inter-annotator agreement of K=0.69, which reflects the difficulty of the task. We subsequently discuss inclusion and exclusion criteria to cope with the diversity of manifestations of experiential knowledge relevant to guideline development.

2021

pdf bib abs
The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF
Alie Lassche | Roser Morante
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.

pdf abs
Is Stance Detection Topic-Independent and Cross-topic Generalizable? - A Reproduction Study
Myrthe Reuver | Suzan Verberne | Roser Morante | Antske Fokkens
Proceedings of the 8th Workshop on Argument Mining

Cross-topic stance detection is the task to automatically detect stances (pro, against, or neutral) on unseen topics. We successfully reproduce state-of-the-art cross-topic stance detection work (Reimers et. al, 2019), and systematically analyze its reproducibility. Our attention then turns to the cross-topic aspect of this work, and the specificity of topics in terms of vocabulary and socio-cultural context. We ask: To what extent is stance detection topic-independent and generalizable across topics? We compare the model’s performance on various unseen topics, and find topic (e.g. abortion, cloning), class (e.g. pro, con), and their interaction affecting the model’s performance. We conclude that investigating performance on different topics, and addressing topic-specific vocabulary and context, is a future avenue for cross-topic stance detection. References Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classification and Clustering of Arguments with Contextualized Word Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 567–578, Florence, Italy. Association for Computational Linguistics.

2020

pdf abs
Annotating Perspectives on Vaccination
Roser Morante | Chantal van Son | Isa Maks | Piek Vossen
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we present the Vaccination Corpus, a corpus of texts related to the online vaccination debate that has been annotated with three layers of information about perspectives: attribution, claims and opinions. Additionally, events related to the vaccination debate are also annotated. The corpus contains 294 documents from the Internet which reflect different views on vaccinations. It has been compiled to study the language of online debates, with the final goal of experimenting with methodologies to extract and contrast perspectives in the framework of the vaccination debate.

pdf abs
Must Children be Vaccinated or not? Annotating Modal Verbs in the Vaccination Debate
Liza King | Roser Morante
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we analyze the use of modal verbs in a corpus of texts related to the vaccination debate. Broadly speaking, the vaccination debate centers around whether vaccination is safe, and whether it is morally acceptable to enforce mandatory vaccination. In order to successfully intervene and curb the spread of preventable diseases due to low vaccination rates, health practitioners need to be adequately informed on public perception of the safety and necessity of vaccines. Public perception can relate to the strength of conviction that an individual may have towards a proposition (e.g. ‘one must vaccinate’ versus ‘one should vaccinate’), as well as qualify the type of proposition, be it related to morality (‘government should not interfere in my personal choice’) or related to possibility (‘too many vaccines at once could hurt my child’). Text mining and analysis of modal auxiliaries are economically viable means of gaining insights into these perspectives, particularly on a large scale due to the widespread use of social media and blogs as vehicles of communication.

pdf abs
Detecting Negation Cues and Scopes in Spanish
Salud María Jiménez-Zafra | Roser Morante | Eduardo Blanco | María Teresa Martín Valdivia | L. Alfonso Ureña López
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this work we address the processing of negation in Spanish. We first present a machine learning system that processes negation in Spanish. Specifically, we focus on two tasks: i) negation cue detection and ii) scope identification. The corpus used in the experimental framework is the SFU Corpus. The results for cue detection outperform state-of-the-art results, whereas for scope detection this is the first system that performs the task for Spanish. Moreover, we provide a qualitative error analysis aimed at understanding the limitations of the system and showing which negation cues and scopes are straightforward to predict automatically, and which ones are challenging.

pdf abs
Corpora Annotated with Negation: An Overview
Salud María Jiménez-Zafra | Roser Morante | María Teresa Martín-Valdivia | L. Alfonso Ureña-López
Computational Linguistics, Volume 46, Issue 1 - March 2020

Negation is a universal linguistic phenomenon with a great qualitative impact on natural language processing applications. The availability of corpora annotated with negation is essential to training negation processing systems. Currently, most corpora have been annotated for English, but the presence of languages other than English on the Internet, such as Chinese or Spanish, is greater every day. In this study, we present a review of the corpora annotated with negation information in several languages with the goal of evaluating what aspects of negation have been annotated and how compatible the corpora are. We conclude that it is very difficult to merge the existing corpora because we found differences in the annotation schemes used, and most importantly, in the annotation guidelines: the way in which each corpus was tokenized and the negation elements that have been annotated. Differently than for other well established tasks like semantic role labeling or parsing, for negation there is no standard annotation scheme nor guidelines, which hampers progress in its treatment.

pdf bib abs
Provenance for Linguistic Corpora through Nanopublications
Timo Lek | Anna de Groot | Tobias Kuhn | Roser Morante
Proceedings of the 14th Linguistic Annotation Workshop

Research in Computational Linguistics is dependent on text corpora for training and testing new tools and methodologies. While there exists a plethora of annotated linguistic information, these corpora are often not interoperable without significant manual work. Moreover, these annota-tions might have evolved into different versions, making it challenging for researchers to know the data’s provenance. This paper addresses this issue with a case study on event annotated corpora and by creating a new, more interoperable representation of this data in the form of nanopublications. We demonstrate how linguistic annotations from separate corpora can be reliably linked from the start, and thereby be accessed and queried as if they were a single dataset. We describe how such nanopublications can be created and demonstrate how SPARQL queries can be performed to extract interesting content from the new representations. The queries show that information of multiple corpora can be retrieved more easily and effectively because the information of different corpora is represented in a uniform data format.

2018

pdf abs
A review of Spanish corpora annotated with negation
Salud María Jiménez-Zafra | Roser Morante | Maite Martin | L. Alfonso Ureña-López
Proceedings of the 27th International Conference on Computational Linguistics

The availability of corpora annotated with negation information is essential to develop negation processing systems in any language. However, there is a lack of these corpora even for languages like English, and when there are corpora available they are small and the annotations are not always compatible across corpora. In this paper we review the existing corpora annotated with negation in Spanish with the purpose of first, gathering the information to make it available for other researchers and, second, analyzing how compatible are the corpora and how has the linguistic phenomenon been addressed. Our final aim is to develop a supervised negation processing system for Spanish, for which we need training and test data. Our analysis shows that it will not be possible to merge the small corpora existing for Spanish due to lack of compatibility in the annotations.

pdf abs
Scoring and Classifying Implicit Positive Interpretations: A Challenge of Class Imbalance
Chantal van Son | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the 27th International Conference on Computational Linguistics

This paper reports on a reimplementation of a system on detecting implicit positive meaning from negated statements. In the original regression experiment, different positive interpretations per negation are scored according to their likelihood. We convert the scores to classes and report our results on both the regression and classification tasks. We show that a baseline taking the mean score or most frequent class is hard to beat because of class imbalance in the dataset. Our error analysis indicates that an approach that takes the information structure into account (i.e. which information is new or contrastive) may be promising, which requires looking beyond the syntactic and semantic characteristics of negated statements.

pdf
Systems’ Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task
Tommaso Caselli | Roser Morante
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Resource Interoperability for Sustainable Benchmarking: The Case of Events
Chantal van Son | Oana Inel | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the Workshop on Computational Semantics beyond Events and Roles
Eduardo Blanco | Roser Morante
Proceedings of the Workshop on Computational Semantics beyond Events and Roles

pdf abs
Annotating Claims in the Vaccination Debate
Benedetta Torsi | Roser Morante
Proceedings of the 5th Workshop on Argument Mining

In this paper we present annotation experiments with three different annotation schemes for the identification of argument components in texts related to the vaccination debate. Identifying claims about vaccinations made by participants in the debate is of great societal interest, as the decision to vaccinate or not has impact in public health and safety. Since most corpora that have been annotated with argumentation information contain texts that belong to a specific genre and have a well defined argumentation structure, we needed to adjust the annotation schemes to our corpus, which contains heterogeneous texts from the Web. We started with a complex annotation scheme that had to be simplified due to low IAA. In our final experiment, which focused on annotating claims, annotators reached 57.3% IAA.

2017

pdf bib
Proceedings of the Workshop Computational Semantics Beyond Events and Roles
Eduardo Blanco | Roser Morante | Roser Saurí
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

pdf abs
Annotating Negation in Spanish Clinical Texts
Noa Cruz | Roser Morante | Manuel J. Maña López | Jacinto Mata Vázquez | Carlos L. Parra Calderón
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

In this paper we present on-going work on annotating negation in Spanish clinical documents. A corpus of anamnesis and radiology reports has been annotated by two domain expert annotators with negation markers and negated events. The Dice coefficient for inter-annotator agreement is higher than 0.94 for negation markers and higher than 0.72 for negated events. The corpus will be publicly released when the annotation process is finished, constituting the first corpus annotated with negation for Spanish clinical reports available for the NLP community.

2016

pdf
VUACLTL at SemEval 2016 Task 12: A CRF Pipeline to Clinical TempEval
Tommaso Caselli | Roser Morante
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Pragmatic Factors in Image Description: The Case of Negations
Emiel van Miltenburg | Roser Morante | Desmond Elliott
Proceedings of the 5th Workshop on Vision and Language

pdf bib
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)
Eduardo Blanco | Roser Morante | Roser Saurí
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)

pdf abs
Building a Dictionary of Affixal Negations
Chantal van Son | Emiel van Miltenburg | Roser Morante
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)

This paper discusses the need for a dictionary of affixal negations and regular antonyms to facilitate their automatic detection in text. Without such a dictionary, affixal negations are very difficult to detect. In addition, we show that the set of affixal negations is not homogeneous, and that different NLP tasks may require different subsets. A dictionary can store the subtypes of affixal negations, making it possible to select a certain subset or to make inferences on the basis of these subtypes. We take a first step towards creating a negation dictionary by annotating all direct antonym pairs inWordNet using an existing typology of affixal negations. By highlighting some of the issues that were encountered in this annotation experiment, we hope to provide some insights into the necessary steps of building a negation dictionary.

This paper presents a framework and methodology for the annotation of perspectives in text. In the last decade, different aspects of linguistic encoding of perspectives have been targeted as separated phenomena through different annotation initiatives. We propose an annotation scheme that integrates these different phenomena. We use a multilayered annotation approach, splitting the annotation of different aspects of perspectives into small subsequent subtasks in order to reduce the complexity of the task and to better monitor interactions between layers. Currently, we have included four layers of perspective annotation: events, attribution, factuality and opinion. The annotations are integrated in a formal model called GRaSP, which provides the means to represent instances (e.g. events, entities) and propositions in the (real or assumed) world in relation to their mentions in text. Then, the relation between the source and target of a perspective is characterized by means of perspective annotations. This enables us to place alternative perspectives on the same entity, event or proposition next to each other.

In this paper we present ConanDoyle-neg, a corpus of stories by Conan Doyle annotated with negation information. The negation cues and their scope, as well as the event or property that is negated have been annotated by two annotators. The inter-annotator agreement is measured in terms of F-scores at scope level. It is higher for cues (94.88 and 92.77), less high for scopes (85.04 and 77.31), and lower for the negated event (79.23 and 80.67). The corpus is publicly available.

Although in recent years numerous forms of Internet communication ― such as e-mail, blogs, chat rooms and social network environments ― have emerged, balanced corpora of Internet speech with trustworthy meta-information (e.g. age and gender) or linguistic annotations are still limited. In this paper we present a large corpus of Flemish Dutch chat posts that were collected from the Belgian online social network Netlog. For all of these posts we also acquired the users' profile information, making this corpus a unique resource for computational and sociolinguistic research. However, for analyzing such a corpus on a large scale, NLP tools are required for e.g. automatic POS tagging or lemmatization. Because many NLP tools fail to correctly analyze the surface forms of chat language usage, we propose to normalize this anomalous' input into a format suitable for existing NLP solutions for standard Dutch. Additionally, we have annotated a substantial part of the corpus (i.e. the Chatty subset) to provide a gold standard for the evaluation of future approaches to automatic (Flemish) chat language normalization.

pdf
*SEM 2012 Shared Task: Resolving the Scope and Focus of Negation
Roser Morante | Eduardo Blanco
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
Mathias Verbeke | Vincent Van Asch | Roser Morante | Paolo Frasconi | Walter Daelemans | Luc De Raedt
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf
Corpus-based approaches to processing the scope of negation cues: an evaluation of the state of the art
Roser Morante | Sarah Schrauwen | Walter Daelemans
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf
SemEval-2010 Task 10: Linking Events and Their Participants in Discourse
Josef Ruppenhofer | Caroline Sporleder | Roser Morante | Collin Baker | Martha Palmer
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
Semantic Role Labeling of Gene Regulation Events: Preliminary Results
Roser Morante
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

pdf
Memory-Based Resolution of In-Sentence Scopes of Hedge Cues
Roser Morante | Vincent Van Asch | Walter Daelemans
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Roser Morante | Caroline Sporleder
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf abs
Descriptive Analysis of Negation Cues in Biomedical Texts
Roser Morante
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present a description of negation cues and their scope in biomedical texts, based on the cues that occur in the BioScope corpus. We provide information about the morphological type of the cue, the characteristics of the scope in relation to the morpho-syntactic features of the cue and of the clause, and the ambiguity level of the cue by describing in which cases certain negation cues do not express negation. Additionally, we provide positive and negative examples per cue from the BioScope corpus. We show that the scope depends mostly on the part-of-speech of the cue and on the syntactic features of the clause. Although several studies have focused on processing negation in biomedical texts, we are not aware of publicly available resources that describe the scope of negation cues in detail. This paper aims at providing information for producing guidelines to annotate corpora with a negation layer, and for building resources that find the scope of negation cues automatically.

2009

pdf
Dependency Parsing and Semantic Role Labeling as a Single Task
Roser Morante | Vincent Van Asch | Antal van den Bosch
Proceedings of the International Conference RANLP-2009

pdf
A Metalearning Approach to Processing the Scope of Negation
Roser Morante | Walter Daelemans
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf
Joint Memory-Based Learning of Syntactic and Semantic Dependencies in Multiple Languages
Roser Morante | Vincent Van Asch | Antal van den Bosch
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf
Learning the Scope of Hedge Cues in Biomedical Texts
Roser Morante | Walter Daelemans
Proceedings of the BioNLP 2009 Workshop

pdf
A memory-based learning approach to event extraction in biomedical texts
Roser Morante | Vincent Van Asch | Walter Daelemans
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf
CNTS: Memory-Based Learning of Generating Repeated References
Iris Hendrickx | Walter Daelemans | Kim Luyckx | Roser Morante | Vincent Van Asch
Proceedings of the Fifth International Natural Language Generation Conference

pdf
A Combined Memory-Based Semantic Role Labeler of English
Roser Morante | Walter Daelemans | Vincent Van Asch
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf abs
Semantic Role Labeling Tools Trained on the Cast3LB-CoNNL-SemRol Corpus
Roser Morante
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present the Cast3LB-CoNLL-SemRol corpus, currently the only corpus of Spanish annotated with dependency syntax and semantic roles, and the tools that have been trained on the corpus: an ensemble of parsers and two dependency-based semantic role labelers that are the only semantic role labelers based on dependency syntax available for Spanish at this moment. One of the systems uses information from gold standard syntax, whereas the other one uses information from predicted syntax. The results of the first system (86 F1) are comparable to current state of the art results for constituent-based semantic role labeling of Spanish. The results of the second are 11 points lower. This work has been carried out as part of the project Técnicas semiautomáticas para el etiquetado de roles semánticos en corpus del español.

pdf
Learning the Scope of Negation in Biomedical Texts
Roser Morante | Anthony Liekens | Walter Daelemans
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing