This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
AndreaZaninello
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In this work, we investigate the relationship between the quality of explanations produced by different models and the amount of implicit knowledge the are able to provide beyond the input. We approximate explanation quality via accuracy on a downstream task with a standardized pipeline (GEISER) and study its correlation with three different association measures, each capturing different aspects of implicitness, defined as a combination of relevance and novelty. We conduct experiments with three SOTA LLMs on four tasks involving implicit knowledge, with explanations either confirming or contradicting the correct label. Our results demonstrate that providing quality explanations consistently improves the accuracy of LLM predictions, even when the models are not explicitly trained to take explanations as input, and underline the correlation between implicit content delivered by the explanation and its effectiveness.
In this paper, we examine the competition between pairs of adjectives in Italian that are antonyms of the same term: one is a “morphological antonym” formed by negative prefixation, the other is a “lexical antonym” with no morphological relationship with the term in question. We consider pairs of adjectives that are reported as antonyms in lexicographic resources and extract the nouns that can be modified by both adjectives from a large corpus. We select a set of 8 nouns for each pair that present higher, lower, and comparable frequencies combined with each antonym respectively and then we perform two experiments with a LLM. Firstly, we perform experiments for masked-token prediction of the adjective, to study the correlation between prediction accuracy and the frequency of the noun-antonym pair. Secondly, we perform a polarity-flip experiment with a multilingual LLM, asking to change the adjective into its positive counterpart, and study the cases where the antonym is changed to the morphological antonym’s lexical base, under the hypothesis that a flip to the lexical base indicates a narrower set of senses of the antonymic counterpart.
In the GEESE challenge, we present a pipeline to evaluate generated explanations for the task of Recognizing Textual Entailment (RTE) in Italian. The challenge focuses on evaluating the impact of generated explanations on the predictive performance of language models. Using a dataset enriched with human-written explanations, we employ two large language models (LLMs) to generate and utilize explanations for semantic relationships between sentence pairs. Our methodology assesses the quality of generated explanations by measuring changes in prediction accuracy when explanations are provided. Through reproducible experimentation, we establish benchmarks against various baseline approaches, demonstrating the potential of explanation injection to enhance model interpretability and performance.
Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.
We assume that providing explanations is a process to elicit implicit knowledge in human communication, and propose a general methodology to generate commonsense explanations from pairs of semantically related sentences. We take advantage of both prompting applied to large, encoder-decoder pre-trained language models, and few-shot learning techniques, such as pattern-exploiting training. Experiments run on the e-SNLI dataset show that the proposed method achieves state-of-the-art results on the explanation generation task, with a substantial reduction of labelled data. The obtained results open new perspective on a number of tasks involving the elicitation of implicit knowledge.
Multiword Expressions (MWEs) are a frequently occurring phenomenon found in all natural languages that is of great importance to linguistic theory, natural language processing applications, and machine translation systems. Neural Machine Translation (NMT) architectures do not handle these expressions well and previous studies have rarely addressed MWEs in this framework. In this work, we show that annotation and data augmentation, using external linguistic resources, can improve both translation of MWEs that occur in the source, and the generation of MWEs on the target, and increase performance by up to 5.09 BLEU points on MWE test sets. We also devise a MWE score to specifically assess the quality of MWE translation which agrees with human evaluation. We make available the MWE score implementation – along with MWE-annotated training sets and corpus-based lists of MWEs – for reproduction and extension.
The theoretical characterisation of multiword expressions (MWEs) is tightly connected to their actual occurrences in data and to their representation in lexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions, revisiting lexicographers's choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions.