Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal

Bryan Li


Abstract
The frontier of large language model (LLM) development has largely been substantiated by knowledge-intensive tasks specified in English. In this proposed thesis, I argue for the key role that multilinguality occupies in the development of practical and knowledgeable LLMs.First, I consider practical methods to improve LLM’s performance on standard natural language processing (NLP) tasks by leveraging their existing multilingual knowledge.Then, I investigate the underlying multilingual knowledge of LLMs with two benchmarks: on complex reasoning, and on territorial disputes. These benchmarks reveal LLMs’ inconsistent performance across languages. I then design efficient techniques, both at inference-time and training-time, to address these discrepancies. Finally, I extend the territorial disputes benchmark to retrieval-augmented generation (RAG) setting, comparing the effects of different retrieval settings on cross-lingual robustness. My proposal shows that informed use of multilinguality enhances LLMs’ capabilities, and our understanding thereof.
Anthology ID:
2025.naacl-srw.30
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
April
Year:
2025
Address:
Albuquerque, USA
Editors:
Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
301–310
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.30/
DOI:
Bibkey:
Cite (ACL):
Bryan Li. 2025. Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 301–310, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal (Li, NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-srw.30.pdf