Marcell Fekete
Also published as: Marcell Richard Fekete
2025
Limited-Resource Adapters Are Regularizers, Not Linguists
Marcell Fekete | Nathaniel Romney Robinson | Ernests Lavrinovics | Djeride Jean-Baptiste | Raj Dabre | Johannes Bjerva | Heather Lent
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Marcell Fekete | Nathaniel Romney Robinson | Ernests Lavrinovics | Djeride Jean-Baptiste | Raj Dabre | Johannes Bjerva | Heather Lent
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Cross-lingual transfer from related high-resource languages is a well-established strategy to enhance low-resource language technologies. Prior work has shown that adapters show promise for, e.g., improving low-resource machine translation (MT). In this work, we investigate an adapter souping method combined with cross-attention fine-tuning of a pre-trained MT model to leverage language transfer for three low-resource Creole languages, which exhibit relatedness to different language groups across distinct linguistic dimensions. Our approach improves performance substantially over baselines. However, we find that linguistic relatedness—or even a lack thereof—does not covary meaningfully with adapter performance. Surprisingly, our cross-attention fine-tuning approach appears equally effective with randomly initialized adapters, implying that the benefit of adapters in this setting lies in parameter regularization, and not in meaningful information transfer. We provide analysis supporting this regularization hypothesis. Our findings underscore the reality that neural language processing involves many success factors, and that not all neural methods leverage linguistic knowledge in intuitive ways.
Linguistically Grounded Analysis of Language Models using Shapley Head Values
Marcell Fekete | Johannes Bjerva
Findings of the Association for Computational Linguistics: NAACL 2025
Marcell Fekete | Johannes Bjerva
Findings of the Association for Computational Linguistics: NAACL 2025
Understanding how linguistic knowledge is encoded in language models is crucial for improving their generalisation capabilities. In this paper, we investigate the processing of morphosyntactic phenomena, by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs). Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions such as anaphor agreement and filler-gap dependencies are handled. Through quantitative pruning and qualitative clustering analysis, we demonstrate that attention heads responsible for processing related linguistic phenomena cluster together. Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information. These findings support the hypothesis that language models learn subnetworks corresponding to linguistic theory, with potential implications for cross-linguistic model analysis and interpretability in Natural Language Processing (NLP).
2024
Leveraging Adapters for Improved Cross-lingual Transfer for Low-Resource Creole MT
Marcell Fekete | Ernests Lavrinovics | Nathaniel Romney Robinson | Heather Lent | Raj Dabre | Johannes Bjerva
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Marcell Fekete | Ernests Lavrinovics | Nathaniel Romney Robinson | Heather Lent | Raj Dabre | Johannes Bjerva
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
———– EXTENDED ABSTRACT INTRODUCTION ———–Creole languages are low-resource languages, often genetically related to languages like English, French, and Portuguese, due to their linguistic histories with colonialism (DeGraff, 2003). As such, Creoles stand to benefit greatly from both data-efficient methods and transfer-learning from high-resource languages. At the same time, it has been observed by Lent et al. (2022b) that machine translation (MT) is a highly desired language technology by speakers of many Creoles. To this end, recent works have contributed new datasets, allowing for the development and evaluation of MT systems for Creoles (Robinson et al., 2024; Lent et al. 2024). In this work, we explore the use of the limited monolingual and parallel data for Creoles using parameter-efficient adaptation methods. Specifically, we compare the performance of different adapter architectures over the set of available benchmarks. We find adapters a promising approach for Creoles because they are parameter-efficient and have been shown to leverage transfer learning between related languages (Faisal and Anastasopoulos, 2022). While we perform experiments across multiple Creoles, we present only on Haitian Creole in this extended abstract. For future work, we aim to explore the potentials for leveraging other high-resourced languages for parameter-efficient transfer learning.
CreoleVal: Multilingual Multitask Benchmarks for Creoles
Heather Lent | Kushal Tatariya | Raj Dabre | Yiyi Chen | Marcell Fekete | Esther Ploeger | Li Zhou | Ruth-Ann Armstrong | Abee Eijansantos | Catriona Malau | Hans Erik Heje | Ernests Lavrinovics | Diptesh Kanojia | Paul Belony | Marcel Bollmann | Loïc Grobol | Miryam de Lhoneux | Daniel Hershcovich | Michel DeGraff | Anders Søgaard | Johannes Bjerva
Transactions of the Association for Computational Linguistics, Volume 12
Heather Lent | Kushal Tatariya | Raj Dabre | Yiyi Chen | Marcell Fekete | Esther Ploeger | Li Zhou | Ruth-Ann Armstrong | Abee Eijansantos | Catriona Malau | Hans Erik Heje | Ernests Lavrinovics | Diptesh Kanojia | Paul Belony | Marcel Bollmann | Loïc Grobol | Miryam de Lhoneux | Daniel Hershcovich | Michel DeGraff | Anders Søgaard | Johannes Bjerva
Transactions of the Association for Computational Linguistics, Volume 12
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of novel development datasets for reading comprehension relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, we see CreoleVal as an opportunity to empower research on Creoles in NLP and computational linguistics, and in general, a step towards more equitable language technology around the globe.
2023
Gradual Language Model Adaptation Using Fine-Grained Typology
Marcell Richard Fekete | Johannes Bjerva
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Marcell Richard Fekete | Johannes Bjerva
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Transformer-based language models (LMs) offer superior performance in a wide range of NLP tasks compared to previous paradigms. However, the vast majority of the world’s languages do not have adequate training data available for monolingual LMs (Joshi et al., 2020). While the use of multilingual LMs might address this data imbalance, there is evidence that multilingual LMs struggle when it comes to model adaptation to to resource-poor languages (Wu and Dredze, 2020), or to languages which have typological characteristics unseen by the LM (Üstün et al., 2022). Other approaches aim to adapt monolingual LMs to resource-poor languages that are related to the model language. However, there are conflicting findings regarding whether language relatedness correlates with successful adaptation (de Vries et al., 2021), or not (Ács et al., 2021). With gradual LM adaptation, our approach presented in this extended abstract, we add to the research direction of monolingual LM adaptation. Instead of direct adaptation to a target language, we propose adaptation in stages, first adapting to one or more intermediate languages before the final adaptation step. Inspired by principles of curriculum learning (Bengio et al., 2009), we search for an ideal ordering of languages that can result in improved LM performance on the target language. We follow evidence that typological similarity might correlate with the success of cross-lingual transfer (Pires et al., 2019; Üstün et al., 2022; de Vries et al., 2021) as we believe the success of this transfer is essential for successful model adaptation. Thus we order languages based on their relative typological similarity between them. In our approach, we quantify typological similarity using structural vectors as derived from counts of dependency links (Bjerva et al., 2019), as such fine-grained measures can give a more accurate picture of the typological characteristics of languages (Ponti et al., 2019). We believe that gradual LM adaptation may lead to improved LM performance on a range of resource-poor languages and typologically diverse languages. Additionally, it enables future research to evaluate the correlation between the success of cross-lingual transfer and various typological similarity measures.