Iris Hendrickx

2022

pdf abs
Negation Detection in Dutch Spoken Human-Computer Conversations
Tom Sweers | Iris Hendrickx | Helmer Strik
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Proper recognition and interpretation of negation signals in text or communication is crucial for any form of full natural language understanding. It is also essential for computational approaches to natural language processing. In this study we focus on negation detection in Dutch spoken human-computer conversations. Since there exists no Dutch (dialogue) corpus annotated for negation we have annotated a Dutch corpus sample to evaluate our method for automatic negation detection. We use transfer learning and trained NegBERT (an existing BERT implementation used for negation detection) on English data with multilingual BERT to detect negation in Dutch dialogues. Our results show that adding in-domain training material improves the results. We show that we can detect both negation cues and scope in Dutch dialogues with high precision and recall. We provide a detailed error analysis and discuss the effects of cross-lingual and cross-domain transfer learning on automatic negation detection.

pdf abs
Creating a Data Set of Abstractive Summaries of Turn-labeled Spoken Human-Computer Conversations
Iris Hendrickx
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Digital recorded written and spoken dialogues are becoming increasingly available as an effect of the technological advances such as online messenger services and the use of chatbots. Summaries are a natural way of presenting the important information gathered from dialogues. We present a unique data set that consists of Dutch spoken human-computer conversations, an annotation layer of turn labels, and conversational abstractive summaries of user answers. The data set is publicly available for research purposes.

pdf abs
Doing not Being: Concrete Language as a Bridge from Language Technology to Ethnically Inclusive Job Ads
Jetske Adams | Kyrill Poelmans | Iris Hendrickx | Martha Larson
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

This paper makes the case for studying concreteness in language as a bridge that will allow language technology to support the understanding and improvement of ethnic inclusivity in job advertisements. We propose an annotation scheme that guides the assignment of sentences in job ads to classes that reflect concrete actions, i.e., what the employer needs people to do, and abstract dispositions, i.e., who the employer expects people to be. Using an annotated dataset of Dutch-language job ads, we demonstrate that machine learning technology is effectively able to distinguish these classes.

2020

An important objective in health-technology is the ability to gather information about people’s well-being. Structured interviews can be used to obtain this information, but are time-consuming and not scalable. Questionnaires provide an alternative way to extract such information, though typically lack depth. In this paper, we present our first prototype of the BLISS agent, an artificial intelligent agent which intends to automatically discover what makes people happy and healthy. The goal of Behaviour-based Language-Interactive Speaking Systems (BLISS) is to understand the motivations behind people’s happiness by conducting a personalized spoken dialogue based on a happiness model. We built our first prototype of the model to collect 55 spoken dialogues, in which the BLISS agent asked questions to users about their happiness and well-being. Apart from a description of the BLISS architecture, we also provide details about our dataset, which contains over 120 activities and 100 motivations and is made available for usage.

2018

pdf
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language
João Sequeira | Teresa Gonçalves | Paulo Quaresma | Amália Mendes | Iris Hendrickx
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Discovering the Language of Wine Reviews: A Text Mining Account
Els Lefever | Iris Hendrickx | Ilja Croijmans | Antal van den Bosch | Asifa Majid
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Annotating Speech, Attitude and Perception Reports
Corien Bary | Leopold Hess | Kees Thijs | Peter Berck | Iris Hendrickx
Proceedings of the 11th Linguistic Annotation Workshop

We present REPORTS, an annotation scheme for the annotation of speech, attitude and perception reports. Such a scheme makes it possible to annotate the various text elements involved in such reports (e.g. embedding entity, complement, complement head) and their relations in a uniform way, which in turn facilitates the automatic extraction of information on, for example, complementation and vocabulary distribution. We also present the Ancient Greek corpus RAG (Thucydides’ History of the Peloponnesian War), to which we have applied this scheme using the annotation tool BRAT. We discuss some of the issues, both theoretical and practical, that we encountered, show how the corpus helps in answering specific questions, and conclude that REPORTS fitted in well with our needs.

2016

The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, a machine translation approach for online educational content. More specifically, videolectures, assignments, and MOOC forum text is automatically translated from English into eleven European and BRIC languages. Unlike previous approaches to machine translation, the output quality in TraMOOC relies on a multimodal evaluation schema that involves crowdsourcing, error type markup, an error taxonomy for translation model comparison, and implicit evaluation via text mining, i.e. entity recognition and its performance comparison between the source and the translated text, and sentiment analysis on the students’ forum posts. Finally, the evaluation output will result in more and better quality in-domain parallel data that will be fed back to the translation engine for higher quality output. The translation service will be incorporated into the Iversity MOOC platform and into the VideoLectures.net digital library portal.

pdf
Very quaffable and great fun: Applying NLP to wine reviews
Iris Hendrickx | Els Lefever | Ilja Croijmans | Asifa Majid | Antal van den Bosch
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf abs
Modality annotation for Portuguese: from manual annotation to automatic labeling
Amália Mendes | Iris Hendrickx | Liciana Ávila | Paulo Quaresma | Teresa Gonҫalves | João Sequeira
Linguistic Issues in Language Technology, Volume 14, 2016 - Modality: Logic, Semantics, Annotation, and Machine Learning

We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classifier trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new unified scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.

2015

pdf bib
Towards a Unified Approach to Modality Annotation in Portuguese
Luciana Beatriz Ávila | Amália Mendes | Iris Hendrickx
Proceedings of the Workshop on Models for Modality Annotation

2014

pdf
SemEval 2014 Task 5 - L2 Writing Assistant
Maarten van Gompel | Iris Hendrickx | Antal van den Bosch | Els Lefever | Véronique Hoste
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa dAmbô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.

pdf
Studying the Semantic Context of two Dutch Causal Connectives
Iris Hendrickx | Wilbert Spooren
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf
Annotating the Interaction between Focus and Modality: the case of exclusive particles
Amália Mendes | Iris Hendrickx | Agostinho Salgueiro | Luciana Ávila
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf
SemEval-2013 Task 4: Free Paraphrases of Noun Compounds
Iris Hendrickx | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Stan Szpakowicz | Tony Veale
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf abs
Introducing the Reference Corpus of Contemporary Portuguese Online
Michel Généreux | Iris Hendrickx | Amália Mendes
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present our work in processing the Reference Corpus of Contemporary Portuguese and its publication online. After discussing how the corpus was built and our choice of meta-data, we turn to the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. The Web platform is described, and we show examples of linguistic resources that can be extracted from the platform for use in linguistic studies or in NLP.

pdf abs
Modality in Text: a Proposal for Corpus Annotation
Iris Hendrickx | Amália Mendes | Silvia Mencarelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a annotation scheme for modality in Portuguese. In our annotation scheme we have tried to combine a more theoretical linguistic viewpoint with a practical annotation scheme that will also be useful for NLP research but is not geared towards one specific application. Our notion of modality focuses on the attitude and opinion of the speaker, or of the subject of the sentence. We validated the annotation scheme on a corpus sample of approximately 2000 sentences that we fully annotated with modal information using the MMAX2 annotation tool to produce XML annotation. We discuss our main findings and give attention to the difficult cases that we encountered as they illustrate the complexity of modality and its interactions with other elements in the text.

2011

pdf
Cross-Domain Dutch Coreference Resolution
Orphée De Clercq | Véronique Hoste | Iris Hendrickx
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf
Complex Predicates Annotation in a Corpus of Portuguese
Iris Hendrickx | Amália Mendes | Sílvia Pereira | Anabela Gonçalves | Inês Duarte
Proceedings of the Fourth Linguistic Annotation Workshop

pdf
Proposal for MWE Annotation in Running Text
Iris Hendrickx | Amália Mendes | Sandra Antunes
Proceedings of the Fourth Linguistic Annotation Workshop

pdf abs
Segmentation Automatique de Lettres Historiques
Michel Généreux | Rita Marquilhas | Iris Hendrickx
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente une approche basée sur la comparaison fréquentielle de modèles lexicaux pour la segmentation automatique de textes historiques Portugais. Cette approche traite d’abord le problème de la segmentation comme un problème de classification, en attribuant à chaque élément lexical présent dans la phase d’apprentissage une valeur de saillance pour chaque type de segment. Ces modèles lexicaux permettent à la fois de produire une segmentation et de faire une analyse qualitative de textes historiques. Notre évaluation montre que l’approche adoptée permet de tirer de l’information sémantique que des approches se concentrant sur la détection des frontières séparant les segments ne peuvent acquérir.

2009

pdf
Is Sentence Compression an NLG task?
Erwin Marsi | Emiel Krahmer | Iris Hendrickx | Walter Daelemans
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf
Reducing Redundancy in Multi-document Summarization Using Lexical Semantic Similarity
Iris Hendrickx | Walter Daelemans | Erwin Marsi | Emiel Krahmer
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2008

We present the main outcomes of the COREA project: a corpus annotated with coreferential relations and a coreference resolution system for Dutch. In the project we developed annotation guidelines for coreference resolution for Dutch and annotated a corpus of 135K tokens. We discuss these guidelines, the annotation tool, and the inter-annotator agreement. We also show a visualization of the annotated relations. The standard approach to evaluate a coreference resolution system is to compare the predictions of the system to a hand-annotated gold standard test set (cross-validation). A more practically oriented evaluation is to test the usefulness of coreference relation information in an NLP application. We run experiments with an Information Extraction module for the medical domain, and measure the performance of this module with and without the coreference relation information. We present the results of both this application-oriented evaluation of our system and of a standard cross-validation evaluation. In a separate experiment we also evaluate the effect of coreference information produced by a simple rule-based coreference module in a Question Answering application.

pdf
CNTS: Memory-Based Learning of Generating Repeated References
Iris Hendrickx | Walter Daelemans | Kim Luyckx | Roser Morante | Vincent Van Asch
Proceedings of the Fifth International Natural Language Generation Conference

pdf
GRAPH: The Costs of Redundancy in Referring Expressions
Emiel Krahmer | Mariët Theune | Jette Viethen | Iris Hendrickx
Proceedings of the Fifth International Natural Language Generation Conference

Iris Hendrickx

2022

2020

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2004

2003

2002

2001

Co-authors

Venues