2025
pdf
bib
Saarland-Groningen at NADI 2025 Shared Task: Effective Dialectal Arabic Speech Processing under Data Constraints
Badr M. Abdullah
|
Yusser Al Ghussin
|
Zena Al-Khalili
|
Ömer Tarik Özyilmaz
|
Matias Valdenegro-Toro
|
Simon Ostermann
|
Dietrich Klakow
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
pdf
bib
abs
Cross-Lingual Document Recommendations with Transformer-Based Representations: Evaluating Multilingual Models and Mapping Techniques
Tsegaye Misikir Tashu
|
Eduard-Raul Kontos
|
Matthia Sabatelli
|
Matias Valdenegro-Toro
Proceedings of the Second Workshop on Scaling Up Multilingual & Multi-Cultural Evaluation
Recommendation systems, for documents, have become tools for finding relevant content on the Web. However, these systems have limitations when it comes to recommending documents in languages different from the query language, which means they might overlook resources in non-native languages. This research focuses on representing documents across languages by using Transformer Leveraged Document Representations (TLDRs) that are mapped to a cross-lingual domain. Four multilingual pre-trained transformer models (mBERT, mT5 XLM RoBERTa, ErnieM) were evaluated using three mapping methods across 20 language pairs representing combinations of five selected languages of the European Union. Metrics like Mate Retrieval Rate and Reciprocal Rank were used to measure the effectiveness of mapped TLDRs compared to non-mapped ones. The results highlight the power of cross-lingual representations achieved through pre-trained transformers and mapping approaches suggesting a promising direction for expanding beyond language connections, between two specific languages.
pdf
bib
abs
Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models
Mirko Borszukovszki
|
Ivo Pascal De Jong
|
Matias Valdenegro-Toro
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
To leverage the full potential of Large Language Models (LLMs) it is crucial to have some information on their answers’ uncertainty. This means that the model has to be able to quantify how certain it is in the correctness of a given response. Bad uncertainty estimates can lead to overconfident wrong answers undermining trust in these models. Quite a lot of research has been done on language models that work with text inputs and provide text outputs. Still, since the visual capabilities have been added to these models recently, there has not been much progress on the uncertainty of Visual Language Models (VLMs). We tested three state-of-the-art VLMs on corrupted image data. We found that the severity of the corruption negatively impacted the models’ ability to estimate their uncertainty and the models also showed overconfidence in most of the experiments.
pdf
bib
abs
Difficulty Estimation in Natural Language Tasks with Action Scores
Aleksandar Angelov
|
Tsegaye Misikir Tashu
|
Matias Valdenegro-Toro
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
This study investigates the effectiveness of the action score, a metric originally developed for computer vision tasks, in estimating sample difficulty across various natural language processing (NLP) tasks. Using transformer-based models, the action score is applied to sentiment analysis, natural language inference, and abstractive text summarization. The results demonstrate that the action score can effectively identify challenging samples in sentiment analysis and natural language inference, often capturing difficult instances that are missed by more established metrics like entropy. However, the effectiveness of the action score appears to be task-dependent, as evidenced by its performance in the abstractive text summarization task, where it exhibits a nearly linear relationship with entropy. The findings suggest that the action score can provide valuable insights into the characteristics of challenging samples in NLP tasks, particularly in classification settings. However, its application should be carefully considered in the context of each specific task and in light of emerging research on the potential value of hard samples in machine learning.
pdf
bib
abs
Uncertainty in Semantic Language Modeling with PIXELS
Stefania Radu
|
Marco Zullich
|
Matias Valdenegro-Toro
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
Pixel-based language models aim to solve the vocabulary bottleneck problem in language modeling, but the challenge of uncertainty quantification remains open. The novelty of this work consists of analysing uncertainty and confidence in pixel-based language models across 18 languages and 7 scripts, all part of 3 semantically challenging tasks. This is achieved through several methods such as Monte Carlo Dropout, Transformer Attention, and Ensemble Learning. The results suggest that pixel-based models underestimate uncertainty when reconstructing patches. The uncertainty is also influenced by the script, with Latin languages displaying lower uncertainty. The findings on ensemble learning show better performance when applying hyperparameter tuning during the named entity recognition and question-answering tasks across 16 languages.
pdf
bib
abs
Confidence Calibration in Large Language Model-Based Entity Matching
Iris Kamsteeg
|
Juan Cardenas-Cartagena
|
Floris van Beers
|
Tsegaye Misikir Tashu
|
Matias Valdenegro-Toro
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.
2024
pdf
bib
abs
Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models
Tobias Groot
|
Matias Valdenegro - Toro
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting. We propose the new Japanese Uncertain Scenes (JUS) dataset, aimed at testing VLM capabilities via difficult queries and object counting, and the Net Calibration Error (NCE) to measure direction of miscalibration.Results show that both LLMs and VLMs have a high calibration error and are overconfident most of the time, indicating a poor capability for uncertainty estimation. Additionally we develop prompts for regression tasks, and we show that VLMs have poor calibration when producing mean/standard deviation and 95% confidence intervals.