Inês Vieira
2026
ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs
Inês Vieira | Inês Calvo | Iago Paulo | James Furtado | Rafael Ferreira | Diogo Tavares | Diogo Glória-Silva | David Semedo | João Magalhães
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Inês Vieira | Inês Calvo | Iago Paulo | James Furtado | Rafael Ferreira | Diogo Tavares | Diogo Glória-Silva | David Semedo | João Magalhães
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
As Large Language Models (LLMs) expand across multilingual domains, evaluating their performance in under-represented languages becomes increasingly important. European Portuguese (pt-PT) is particularly affected, as existing training data and benchmarks are mainly in Brazilian Portuguese (pt-BR). To address this, we introduce ALBA, a linguistically grounded benchmark designed from the ground up to assess LLM proficiency in linguistic-related tasks in pt-PT across eight linguistic dimensions, including Language Variety, Culture-bound Semantics, Discourse Analysis, Word Plays, Syntax, Morphology, Lexicology, and Phonetics and Phonology. ALBA is manually constructed by language experts and paired with an LLM-as-a-judge framework for scalable evaluation of pt-PT generated language. Experiments on a diverse set of models reveal performance variability across linguistic dimensions, highlighting the need for comprehensive, variety-sensitive benchmarks that support further development of tools in pt-PT.
AMALIA: A Fully Open Large Language Model for European Portuguese
Afonso Simplício | Gonçalo Vinagre | Miguel Moura Ramos | Diogo Tavares | Rafael Ferreira | Giuseppe Attanasio | Duarte M. Alves | Inês Calvo | Inês Vieira | Rui Guerra | James Furtado | Beatriz Canaverde | Iago Paulo | Vasco Ramos | Diogo Glória-Silva | Miguel Faria | Marcos Treviso | Daniel Gomes | Pedro Gomes | David Semedo | André Martins | João Magalhães
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Afonso Simplício | Gonçalo Vinagre | Miguel Moura Ramos | Diogo Tavares | Rafael Ferreira | Giuseppe Attanasio | Duarte M. Alves | Inês Calvo | Inês Vieira | Rui Guerra | James Furtado | Beatriz Canaverde | Iago Paulo | Vasco Ramos | Diogo Glória-Silva | Miguel Faria | Marcos Treviso | Daniel Gomes | Pedro Gomes | David Semedo | André Martins | João Magalhães
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant’s linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.
P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs
Rafael Ferreira | Inês Vieira | Inês Calvo | James Furtado | Iago Paulo | Diogo Glória-Silva | Diogo Tavares | David Semedo | Joao Magalhaes
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Rafael Ferreira | Inês Vieira | Inês Calvo | James Furtado | Iago Paulo | Diogo Glória-Silva | Diogo Tavares | David Semedo | Joao Magalhaes
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
As Large Language Models (LLMs) become embedded in everyday communication, capturing regional linguistic variation is essential for reliable and equitable language use. In Portuguese, European (pt-PT) and Brazilian (pt-BR) varieties remain unevenly represented, with pt-BR dominating in data quantity, while LLM preference for Portuguese variants remains underexplored.To address this gap, we introduce P3B3, an expert-curated variety agnostic benchmark of conversational prompts, along with an evaluation framework for measuring variety bias and controllability.Experiments on several models show that most LLMs exhibit a strong bias toward pt-BR, with variation in controllability across models. These results highlight the need for more balanced multilingual representation across language varieties.