Pawitsapak Akarajaradwong


2025

pdf bib
NitiBench: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities
Pawitsapak Akarajaradwong | Pirat Pothavorn | Chompakorn Chaksangchaichot | Panuthep Tasawong | Thitiwat Nopparatbundit | Keerakiat Pratai | Sarana Nutanong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) show promise in legal question answering (QA), yet Thai legal QA systems face challenges due to limited data and complex legal structures. We introduce NitiBench, a novel benchmark featuring two datasets: (1) NitiBench-CCL, covering Thai financial laws, and (2) NitiBench-Tax, containing Thailand’s official tax rulings. Our benchmark also consists of specialized evaluation metrics suited for Thai legal QA. We evaluate retrieval-augmented generation (RAG) and long-context LLM (LCLM) approaches across three key dimensions: (1) the benefits of domain-specific techniques like hierarchy-aware chunking and cross-referencing, (2) comparative performance of RAG components, e.g., retrievers and LLMs, and (3) the potential of long-context LLMs to replace traditional RAG systems. Our results reveal that domain-specific components slightly improve over naive methods. At the same time, existing retrieval models still struggle with complex legal queries, and long-context LLMs have limitations in consistent legal reasoning. Our study highlights current limitations in Thai legal NLP and lays a foundation for future research in this emerging domain.

pdf bib
Cold Starts and Hard Cases: A Two-Stage SFT-RLVR Approach for Legal Machine Translation (Just-NLP L-MT shared task)
Pawitsapak Akarajaradwong | Chompakorn Chaksangchaichot
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)

This paper details our system for the JUST-NLP 2025 Shared Task on English-to-Hindi Legal Machine Translation. We propose a novel two-stage, data-centric approach. First, we annotate the training data by translation difficulty and create easy and hard subsets.We perform SFT on the easier subset to establish a robust “cold start”. Then, we apply RLVR exclusively on the harder subset, using machine translation metrics as reward signals. This strategy allowed our system to significantly outperform strong baselines, demonstrating the capability of our systems for machine translation tasks. Source code and model weights are available at https://github.com/ppaolong/FourCorners-JustNLP-MT-Shared-Task

pdf bib
A Budget Recipe for Finetuning a Long-form Legal Summarization Model
Chompakorn Chaksangchaichot | Pawitsapak Akarajaradwong
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)

We describe an inexpensive system that ranked first in the JUST-NLP 2025 L-SUMM task, summarizing very long Indian court judgments (up to 857k characters) using a single 80GB GPU and a total budget of about $50. Our pipeline first filters out length–summary outliers, then applies two-stage LoRA SFT on Qwen3-4B-Instruct-2507 to learn style and extend context, and finally runs RLVR tuned to BLEU, ROUGE-2, and ROUGE-L, with BLEU upweighted. We showed that two-stage SFT is better than a single-stage run, and RLVR gives the largest gains, reaching 32.71 internal vs. 16.15 base and 29.91 on the test leaderboard. In ablation on prompting, we find that a simple, naive prompt converges faster but saturates earlier, while the curated legal-structured prompt keeps improving with longer training and yields higher final scores, and the finetuned model remains fairly robust to unseen prompts. Our code are fully open-sourced, available for reproducibility.

pdf bib
Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards
Pawitsapak Akarajaradwong | Chompakorn Chaksangchaichot | Pirat Pothavorn | Ekapol Chuangsuwanich | Attapol Rutherford | Sarana Nutanong
Proceedings of the Natural Legal Language Processing Workshop 2025

The Retrieval-Augmented Generation (RAG) systems’ performance on Thai legal question answering is still limited, especially for questions requiring extensive, complex legal reasoning. To address these limitations, we introduce a resource-efficient approach that aligns Large Language Models (LLMs) for improved citation accuracy and response quality using Group-Relative Policy Optimization (GRPO). Our proposed method leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward, significantly reducing computational expenses up to 2.5x compared to an LLM-based reward model. Experiments on the NitiBench benchmark demonstrate substantial improvements: GRPO achieves up to 90% citation-F1 gains relative to the base model and a 31% increase in joint quality metrics over instruction tuning. Crucially, our approach provides a practical and effective solution for enhancing legal LLMs in resource-constrained environments.