Tomás Bernal - Beltrán

Also published as: Tomás Bernal-beltrán, Tomás Bernal-Beltrán

2026

UMUTeam at SemEval-2026 Task 10: Transformer Ensembles for Conspiratorial Span Extraction and Detection
Jorge Gómez-Navalón | Ronghao Pan | Tomás Bernal-Beltrán | José Antonio García-Díaz | Rafael Valencia-Garcia
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Conspiracy theories pose significant societal risks and require reliable automated detection methods. In this paper, we present our system for SemEval 2026 Task 10, addressing both conspiracy detection and psycholinguistic marker extraction. We leverage multiple pretrained transformer architectures and ensemble strategies to model conspiratorial discourse at both document and token levels. For classification, our ensemble achieves a weighted F1-score of 0.7688, indicating effective performance in distinguishing conspiratorial statements. For marker extraction, we formulate the task as a BIOES sequence labeling problem and enhance predictions through ensemble and specialist models. Our results highlight both the effectiveness of transformer-based approaches and the challenges of fine-grained conspiracy marker extraction.

pdf bib abs

UMUTeam at SemEval-2026 Task 6: Soft-Voting Transformer Ensembles for Detecting and Classifying Response Ambiguity in Political Discourse
Tomás Bernal-Beltrán | Ronghao Pan | Jorge Gómez-Navalón | José Antonio García-Díaz | Rafael Valencia-Garcia
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Political discourse frequently involves strategically ambiguous responses, particularly in high-stakes settings such as presidential debates and interviews. Detecting whether a politician has directly answered a question, provided an ambiguous reply or issued a clear non-reply remains a challenging task due to the pragmatic and rhetorical nature of political language. This paper describes our participation in the SemEval 2026 CLARITY shared task on response ambiguity detection and classification in English. We focused exclusively on Task 1 (Clarity-level Classification) and proposed a weighted soft-voting ensemble that combines four fine-tuned encoder-only transformer models: RoBERTa-large, BERT-large-cased, DistilBERT-cased and ModernBERT-large. Each model was optimized through grid search and their predicted class probability distributions were aggregated using a weighted linear combination. On the official test set, our system achieved a macro-F1 score of 0.71, ranking 26th out of 41 participating teams. Even with the performance gap compared to top-ranked systems, our results demonstrate that a lightweight set of moderately sized encoder models can provide stable and competitive performance without relying on external data or large-scale architectures.

2025

pdf bib abs

UMUTeam at SemEval-2025 Task 7: Multilingual Fact-Checked Claim Retrieval with XLM-RoBERTa and Self-Alignment Pretraining Strategy
Ronghao Pan | Tomás Bernal - Beltrán | José Antonio García - Díaz | Rafael Valencia - García
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

In today’s digital age, the rapid dissemination of information through social networks poses significant challenges in verifying the veracity of shared content. The proliferation of misinformation can have serious consequences, influencing public opinion, policy decisions, and social dynamics. Fact-checking plays a critical role in countering misinformation; however, the manual verification process is time-consuming, especially when dealing with multilingual content. This paper presents our participation in the Multilingual and Crosslingual Fact-Checked Claim Retrieval task (SemEval 2025), which seeks to identify previously fact-checked claims relevant to social media posts. Our proposed system leverages XLM-RoBERTa, a multilingual Transformer model, combined with metric learning and hard negative mining strategies, to optimize the semantic comparison of posts and fact-checks across multiple languages. By fine-tuning a shared embedding space and employing a multiple similarity loss function, our approach enhances retrieval accuracy while maintaining efficiency. Evaluation results demonstrate competitive performance across multiple languages, reaching 25th place and highlighting the potential of multilingual NLP models in automating the fact-checking process and mitigating misinformation spread.

pdf bib abs

UMUTeam at SemEval-2025 Task 3: Detecting Hallucinations in Multilingual Texts Using Encoder-only Models Guided by Large Language Models
Ronghao Pan | Tomás Bernal - Beltrán | José Antonio García - Díaz | Rafael Valencia - García
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Large Language Models like GPT-4, LLaMa, Mistral, and Gemma have revolutionized Natural Language Processing, advancing language comprehension, generation, and reasoning. However, they also present challenges, particularly the tendency to hallucinate—that is, to produce false or fabricated information. This paper presents our participation in Task 3 Mu-SHROOM of SemEval 2025, which focuses on detecting hallucinations in multilingual contexts. Specifically, the task requires identifying text segments generated by LLMs that correspond to hallucinations and calculating the hallucination probability for each character in the text. To address this challenge, we adopted a token classification approach using the pre-trained XLM-RoBERTa-large model, fine-tuned on the provided training set. Additionally, we integrated context from Llama-3.1-70B to enhance hallucination detection by leveraging its broader and more up-to-date knowledge base. Our approach combines the multilingual capability of XLM-RoBERTa with the contextual understanding of Llama-3.1-70B, producing a detailed hallucination probability for each character in the text. The results demonstrate that our approach consistently outperforms baseline methods across multiple languages, particularly in detecting token-level hallucinations.

pdf bib abs

UMUTeam at SemEval-2025 Task 1: Leveraging Multimodal and Large Language Model for Identifying and Ranking Idiomatic Expressions
Ronghao Pan | Tomás Bernal - Beltrán | José Antonio García - Díaz | Rafael Valencia - García
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Idioms are non-compositional linguistic expressions whose meanings cannot be directly inferred from the individual words that compose them, posing significant challenges for natural language processing systems. This paper describes the participation of the UMUTeam in Subtask A of the AdMIRe shared task (SemEval 2025), which focuses on understanding idiomatic expressions through visual and contextual representations in English and Portuguese. Specifically, the task involves ranking a set of images according to how well they represent the sense of a potentially idiomatic nominal compound within a given contextual sentence. To address this challenge, we adopted a multimodal approach that combines textual and visual features using pre-trained language models, such as BERT and XLM-RoBERTa, along with Vision Transformers. Additionally, we explored the in-context learning capabilities of Large Language Models (LLMs), particularly Llama-3.1-8B, for image classification. These models are trained using a regression approach to rank images according to their semantic alignment with the contextual meaning of idioms. The results show that the Llama-3.1-8B model performs best for English, ranking 32 out of 36, while the XLM + ViT model is more effective for Portuguese, ranking 21 out of 24.

2024

pdf bib abs

UMUTeam at SemEval-2024 Task 6: Leveraging Zero-Shot Learning for Detecting Hallucinations and Related Observable Overgeneration Mistakes
Ronghao Pan | José Antonio García-díaz | Tomás Bernal-beltrán | Rafael Valencia-garcía
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In these working notes we describe the UMUTeam’s participation in SemEval-2024 shared task 6, which aims at detecting grammatically correct output of Natural Language Generation with incorrect semantic information in two different setups: model-aware and model-agnostic tracks. The task is consists of three subtasks with different model setups. Our approach is based on exploiting the zero-shot classification capability of the Large Language Models LLaMa-2, Tulu and Mistral, through prompt engineering. Our system ranked eighteenth in the model-aware setup with an accuracy of 78.4% and 29th in the model-agnostic setup with an accuracy of 76.9333%.

Co-authors

Venues

SemEval6
WS5

Fix author