Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards
Pawitsapak Akarajaradwong, Chompakorn Chaksangchaichot, Pirat Pothavorn, Ekapol Chuangsuwanich, Attapol Rutherford, Sarana Nutanong
Abstract
The Retrieval-Augmented Generation (RAG) systems’ performance on Thai legal question answering is still limited, especially for questions requiring extensive, complex legal reasoning. To address these limitations, we introduce a resource-efficient approach that aligns Large Language Models (LLMs) for improved citation accuracy and response quality using Group-Relative Policy Optimization (GRPO). Our proposed method leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward, significantly reducing computational expenses up to 2.5x compared to an LLM-based reward model. Experiments on the NitiBench benchmark demonstrate substantial improvements: GRPO achieves up to 90% citation-F1 gains relative to the base model and a 31% increase in joint quality metrics over instruction tuning. Crucially, our approach provides a practical and effective solution for enhancing legal LLMs in resource-constrained environments.- Anthology ID:
- 2025.nllp-1.21
- Volume:
- Proceedings of the Natural Legal Language Processing Workshop 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
- Venues:
- NLLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 304–316
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.21/
- DOI:
- Cite (ACL):
- Pawitsapak Akarajaradwong, Chompakorn Chaksangchaichot, Pirat Pothavorn, Ekapol Chuangsuwanich, Attapol Rutherford, and Sarana Nutanong. 2025. Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 304–316, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards (Akarajaradwong et al., NLLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.21.pdf