Danileth Almanza


2026

In multilingual and multicultural contexts, LLMs require contextualization mechanisms to generate culturally coherent responses. In this sense, this study presents a LLaMA-based approach to answer short cultural questions in different languages within Task 7 of SemEval-2026 (Track 1: SAQ), without access to official training data. The system integrates controlled synthetic data generation, evidence retrieval through web snippets, and a Retrieval-Augmented Generation (RAG) framework with Few-shot learning. BLEnD is used solely as a thematic guide, ensuring semantic independence. During development, the LLaMA-3.1-8B model achieved 38.51\% global accuracy, while LLaMA-3.2-1B obtained 15.54\%. In large-scale evaluation (30,500 instances), the 1B model achieved 16.69\%, maintaining stability after prompt optimization. The results demonstrate that contextual retrieval improves multilingual cultural knowledge evaluation and highlight the importance of pipeline design and model capacity.