RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish
Süha Kağan Köse, Mehmet Can Baytekin, Burak Aktaş, Bilge Kaan Görür, Evren Ayberk Munis, Deniz Yılmaz, Muhammed Yusuf Kartal, Cagri Toraman
Abstract
Retrieval-Augmented Generation (RAG) enhances LLM factuality, yet design guidance remains English-centric, limiting insights for morphologically rich languages like Turkish. We address this by constructing a comprehensive Turkish RAG dataset derived from Turkish Wikipedia and CulturaX, comprising question-answer pairs and relevant passage chunks. We benchmark seven stages of the RAG pipeline—from query transformation and reranking to answer refinement—without task-specific fine-tuning. Our results show that complex methods like HyDE maximize accuracy (85%) that is considerably higher than the baseline (78.70%). Also a Pareto-optimal configuration using Cross-encoder Reranking and Context Augmentation achieves comparable performance (84.60%) with much lower cost. We further demonstrate that over-stacking generative modules can degrade performance by distorting morphological cues, whereas simple query clarification with robust reranking offers an effective solution.- Anthology ID:
- 2026.sigturk-1.15
- Volume:
- Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Kemal Oflazer, Abdullatif Köksal, Onur Varol
- Venues:
- SIGTURK | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 179–196
- Language:
- URL:
- https://preview.aclanthology.org/manual-author-scripts/2026.sigturk-1.15/
- DOI:
- Cite (ACL):
- Süha Kağan Köse, Mehmet Can Baytekin, Burak Aktaş, Bilge Kaan Görür, Evren Ayberk Munis, Deniz Yılmaz, Muhammed Yusuf Kartal, and Cagri Toraman. 2026. RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish. In Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026), pages 179–196, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish (Köse et al., SIGTURK 2026)
- PDF:
- https://preview.aclanthology.org/manual-author-scripts/2026.sigturk-1.15.pdf