Josh Mcgiff


2026

We present Irish-BLiMP (Irish Benchmark of Linguistic Minimal Pairs), the first dataset and framework designed for fine-grained evaluation of linguistic competence in the Irish language, an endangered language. Drawing on a variety of linguistic literature and grammar reference works, a team of fluent Irish speakers manually constructed and reviewed 1020 minimal pairs across a taxonomy of 11 linguistic features. We evaluate both existing Large Language Models (LLMs) and fluent human participants on their syntactic knowledge of Irish. Our findings show that humans outperform all models across all linguistic features, achieving 16.6% higher accuracy on average. Moreover, a substantial performance gap of 18.1% persists between open- and closed-source LLMs, with even the strongest model (gpt-5) reaching only 73.5% accuracy compared to 90.1% by human. Interestingly, human participants and models struggle on different aspects of Irish grammar, thus highlighting a difference in representation learned by the models. Overall, Irish-BLiMP provides the first systematic framework for evaluating the grammatical competence of LLMs in Irish and offers a valuable benchmark for advancing research on linguistic understanding in low-resource languages.
Fine-tuning is widely used to adapt multilingual Transformer models for machine translation (MT) in specific domains. However, full-parameter fine-tuning of large multilingual models with billions of parameters is computationally expensive, thus creating a barrier to entry for researchers working on low-resource tasks such as Irish translation. Parameter-efficient fine-tuning (PEFT) addresses this by updating a fraction of the original model parameters, with the Low-Rank Adaptation approach (LoRA) introducing small, trainable adapter layers. We introduce SemiAdapt-Full and SemiAdapt-LoRA as semi-supervised approaches that leverage inferred domains to improve overall performance in MT. SemiAdapt-LoRA employs dynamic routing at inference time, eliminating the need to load multiple separately fine-tuned models. Instead, a single shared base model is maintained while lightweight domain-specific adapters, updating only 1.39% of the model parameters in our case, are activated dynamically. We demonstrate that SemiAdapt-Full can outperform full-model fine-tuning and SemiAdapt-LoRA can propel PEFT methods to compete with full-model fine-tuning. We further evaluate corpus-level domain fine-tuning and demonstrate that our embedding-based inference methods perform especially well on larger and noisier corpora. Code and training configurations are released to support reproducibility. Ultimately, our approach narrows the performance gap between PEFT and full-parameter fine-tuning, offering resource-constrained researchers a computationally efficient alternative.