Zhongxiang Dai
2026
Self-Reflective Generation at Test Time
Jian Mu | Qixin Zhang | Zhiyong Wang | Menglin Yang | Shuang Qiu | Chengwei Qin | Zhongxiang Dai | Yao Shu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jian Mu | Qixin Zhang | Zhiyong Wang | Menglin Yang | Shuang Qiu | Chengwei Qin | Zhongxiang Dai | Yao Shu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) increasingly solve complex reasoning tasks via long chain-of-thought, but their forward-only autoregressive generation process is fragile; early token errors can cascade, which creates a clear need for self-reflection mechanisms. However, existing self-reflection either performs revisions over full drafts or learns self-correction via expensive training, both fundamentally reactive and inefficient. To address this, we propose Self-Reflective Generation at Test Time (SRGen), a lightweight test-time framework that reflects before generating at uncertain points. During token generation, SRGen utilizes dynamic entropy thresholding to identify high-uncertainty tokens. For each identified token, it trains a specific corrective vector, which fully exploits the already generated context for a self-reflective generation to correct the token probability distribution. By retrospectively analyzing the partial output, this self-reflection enables more trustworthy decisions, thereby significantly reducing the probability of errors at highly uncertain points. Evaluated on challenging mathematical reasoning benchmarks and a diverse set of LLMs, SRGen can significantly strengthen model reasoning. Moreover, our findings position SRGen as a plug-and-play method that integrates reflection into the generation process for reliable LLM reasoning, achieving consistent gains with bounded overhead and can be combined with other training-time (e.g., RLHF) and test-time (e.g., SLOT) techniques.
Large Language Model-Enhanced Multi-Armed Bandits
Jiahang Sun | Zhiyong Wang | Runhan Yang | Chenjun Xiao | John C.s. Lui | Zhongxiang Dai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiahang Sun | Zhiyong Wang | Runhan Yang | Chenjun Xiao | John C.s. Lui | Zhongxiang Dai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have been applied to sequential decision-making tasks like multi-armed bandits (MAB), where an LLM is tasked with selecting arms in each iteration. However, this direct arm selection approach is often suboptimal. We propose an alternative method combining classical MAB algorithms with LLMs. Specifically, we use a classical MAB framework and leverage the in-context learning capability of LLMs for reward prediction. First, we integrate the LLM-based predictor into Thompson sampling (TS) with a decaying temperature schedule to balance exploration and exploitation. We also incorporate the predictor into a regression oracle-based MAB algorithm with explicit exploration. Additionally, we extend our TS-based algorithm to dueling bandits, where only preference feedback between arm pairs is available, requiring significant algorithmic modifications. Our empirical evaluations on synthetic MAB tasks show that our algorithms outperform LLM-based direct arm selection. In experiments on real-world text datasets, we demonstrate that, in tasks where arms lack exploitable semantic meaning, our approach delivers significantly better performance than direct arm selection.
2025
WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data
Xinyang Lu | Jingtan Wang | Zitong Zhao | Zhongxiang Dai | Chuan-Sheng Foo | See-Kiong Ng | Bryan Kian Hsiang Low
Findings of the Association for Computational Linguistics: ACL 2025
Xinyang Lu | Jingtan Wang | Zitong Zhao | Zhongxiang Dai | Chuan-Sheng Foo | See-Kiong Ng | Bryan Kian Hsiang Low
Findings of the Association for Computational Linguistics: ACL 2025
The impressive performances of Large Language Models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the Intellectual Property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to perform source attribution by identifying the data provider who contributed to the generation of a synthetic text by an LLM. In this paper, we show that this problem can be tackled by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a source attribution framework that satisfies these key properties due to our algorithmic designs. Our framework enables an LLM to learn an accurate mapping from the generated texts to data providers, which sets the foundation for effective source attribution. Extensive empirical evaluations show that our framework achieves effective source attribution.
2024
Position Paper: Data-Centric AI in the Age of Large Language Models
Xinyi Xu | Zhaoxuan Wu | Rui Qiao | Arun Verma | Yao Shu | Jingtan Wang | Xinyuan Niu | Zhenfeng He | Jiangwei Chen | Zijian Zhou | Gregory Kang Ruey Lau | Hieu Dao | Lucas Agussurja | Rachael Hwee Ling Sim | Xiaoqiang Lin | Wenyang Hu | Zhongxiang Dai | Pang Wei Koh | Bryan Kian Hsiang Low
Findings of the Association for Computational Linguistics: EMNLP 2024
Xinyi Xu | Zhaoxuan Wu | Rui Qiao | Arun Verma | Yao Shu | Jingtan Wang | Xinyuan Niu | Zhenfeng He | Jiangwei Chen | Zijian Zhou | Gregory Kang Ruey Lau | Hieu Dao | Lucas Agussurja | Rachael Hwee Ling Sim | Xiaoqiang Lin | Wenyang Hu | Zhongxiang Dai | Pang Wei Koh | Bryan Kian Hsiang Low
Findings of the Association for Computational Linguistics: EMNLP 2024
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making a key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and advocate that data-centric research should receive more attention from the community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.
Search
Fix author
Co-authors
- Bryan Kian Hsiang Low 2
- Yao Shu 2
- Jingtan Wang 2
- Zhiyong Wang 2
- Lucas Agussurja 1
- Jiangwei Chen 1
- Hieu Dao 1
- Chuan-Sheng Foo 1
- Zhenfeng He 1
- Wenyang Hu 1
- Pang Wei Koh 1
- Gregory Kang Ruey Lau 1
- Xiaoqiang Lin 1
- Xinyang Lu 1
- John C.s. Lui 1
- Jian Mu 1
- See Kiong Ng 1
- Xinyuan Niu 1
- Rui Qiao 1
- Chengwei Qin 1
- Shuang Qiu 1
- Rachael Hwee Ling Sim 1
- Jiahang Sun 1
- Arun Verma 1
- Zhaoxuan Wu 1
- Chenjun Xiao 1
- Xinyi Xu 1
- Menglin Yang 1
- Runhan Yang 1
- Qixin Zhang 1
- Zitong Zhao 1
- Zijian Zhou 1