2024
pdf
bib
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024
Federico Gaspari
|
Joss Moorkens
|
Itziar Aldabe
|
Aritz Farwell
|
Begona Altuna
|
Stelios Piperidis
|
Georg Rehm
|
German Rigau
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024
pdf
abs
A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models
Francesca De Luca Fornaciari
|
Begoña Altuna
|
Itziar Gonzalez-Dios
|
Maite Melero
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)
In this work, we explore idiomatic language processing with Large Language Models (LLMs). We introduce the Idiomatic language Test Suite IdioTS, a dataset of difficult examples specifically designed by language experts to assess the capabilities of LLMs to process figurative language at sentence level. We propose a comprehensive evaluation methodology based on an idiom detection task, where LLMs are prompted with detecting an idiomatic expression in a given English sentence. We present a thorough automatic and manual evaluation of the results and a comprehensive error analysis.
pdf
abs
A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset
Giulia Pensa
|
Begoña Altuna
|
Itziar Gonzalez-Dios
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In this paper, we explore physical commonsense reasoning of large language models (LLMs) and propose a specific methodology to evaluate low-level understanding of the physical world. Specifically, the goal is to create a test set to analyze physical commonsense reasoning in large language models for Italian and focus on a trustworthy analysis of the results. To that end, we present a tiered Italian dataset, called Graded Italian Annotated dataset (GITA), written and thoroughly annotated by a professional linguist, which allows us to concentrate on three different levels of commonsense understanding. Moreover, we create a semi-automated system to complete the accurate annotation of the dataset. We also validate our dataset by carrying out three tasks with a multilingual model (XLM-RoBERTa) and propose a qualitative analysis of the results. We found out that, although the model may perform at high-level classification tasks, its easoning is inconsistent and unverifiable, since it does not capture intermediate evidence.
pdf
abs
Automatic Detection and Labelling of Personal Data in Case Reports from the ECHR in Spanish: Evaluation of Two Different Annotation Approaches
Maria Sierro
|
Begoña Altuna
|
Itziar Gonzalez-Dios
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
In this paper we evaluate two annotation approaches for automatic detection and labelling of personal information in legal texts in relation to the ambiguity of the labels and the homogeneity of the annotations. For this purpose, we built a corpus of 44 case reports from the European Court of Human Rights in Spanish language and we annotated it following two different annotation approaches: automatic projection of the annotations of an existing English corpus, and manual annotation with our reinterpretation of their guidelines. Moreover, we employ Flair on a Named Entity Recognition task to compare its performance in the two annotation schemes.
2023
pdf
abs
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models
Iker García-Ferrero
|
Begoña Altuna
|
Javier Alvez
|
Itziar Gonzalez-Dios
|
German Rigau
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.
2022
pdf
bib
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
Itziar Aldabe
|
Begoña Altuna
|
Aritz Farwell
|
German Rigau
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
2017
pdf
abs
The Scope and Focus of Negation: A Complete Annotation Framework for Italian
Begoña Altuna
|
Anne-Lyse Minard
|
Manuela Speranza
Proceedings of the Workshop Computational Semantics Beyond Events and Roles
In this paper we present a complete framework for the annotation of negation in Italian, which accounts for both negation scope and negation focus, and also for language-specific phenomena such as negative concord. In our view, the annotation of negation complements more comprehensive Natural Language Processing tasks, such as temporal information processing and sentiment analysis. We applied the proposed framework and the guidelines built on top of it to the annotation of written texts, namely news articles and tweets, thus producing annotated data for a total of over 36,000 tokens.
2016
pdf
abs
MEANTIME, the NewsReader Multilingual Event and Time Corpus
Anne-Lyse Minard
|
Manuela Speranza
|
Ruben Urizar
|
Begoña Altuna
|
Marieke van Erp
|
Anneleen Schoen
|
Chantal van Son
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, and numerical expressions), relations between markables (modeling, for example, temporal information and semantic role labeling), and entity and event intra-document coreference. The corpus-level annotation includes entity and event cross-document coreference. Semantic annotation on the English section was performed manually; for the annotation in Italian, Spanish, and (partially) Dutch, a procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements; this enabled us not only to speed up the annotation process but also provided cross-lingual coreference. The English section of the corpus was extended with timeline annotations for the SemEval 2015 TimeLine shared task. The “First CLIN Dutch Shared Task” at CLIN26 was based on the Dutch section, while the EVALITA 2016 FactA (Event Factuality Annotation) shared task, based on the Italian section, is currently being organized.