Tong Che
2026
Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety
Can Jin | Rui Wu | Tong Che | Qixin Zhang | Hongwu Peng | Jiahui Zhao | Zhenting Wang | Wenqi Wei | Ligong Han | Zhao Zhang | Yuan Cao | Ruixiang Tang | Dimitris N. Metaxas
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Can Jin | Rui Wu | Tong Che | Qixin Zhang | Hongwu Peng | Jiahui Zhao | Zhenting Wang | Wenqi Wei | Ligong Han | Zhao Zhang | Yuan Cao | Ruixiang Tang | Dimitris N. Metaxas
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ensuring that Large Language Models (LLMs) adhere to safety principles without refusing benign requests remains a significant challenge. While OpenAI introduces deliberative alignment (DA) to enhance the safety of its o-series models through reasoning over detailed “code-like” safety rules, the effectiveness of this approach in open-source LLMs, which typically lack advanced reasoning capabilities, is understudied. In this work, we systematically evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases. We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness, whereas training on case-augmented simple codes yields more robust and generalized safety behaviors. By guiding LLMs with case-augmented reasoning instead of extensive code-like safety rules, we avoid rigid adherence to narrowly enumerated rules and enable broader adaptability. Building on these insights, we propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains. CADA effectively enhances harmlessness, improves robustness against attacks, and reduces over-refusal while preserving utility across diverse benchmarks, offering a practical alternative to rule-only DA for improving safety while maintaining helpfulness.
2025
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
Di Zhang | Jianbo Wu | Jingdi Lei | Tong Che | Jiatong Li | Tong Xie | Xiaoshui Huang | Shufei Zhang | Marco Pavone | Yuqiang Li | Wanli Ouyang | Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Di Zhang | Jianbo Wu | Jingdi Lei | Tong Che | Jiatong Li | Tong Xie | Xiaoshui Huang | Shufei Zhang | Marco Pavone | Yuqiang Li | Wanli Ouyang | Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
This paper presents LLaMA-Berry, an advanced mathematical reasoning framework to enhance the problem-solving ability of large language models (LLMs). The framework combines Monte Carlo Tree Search with Self-Refine (SR-MCTS) to optimize the reasoning paths and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critique and rewriting capabilities of LLMs, our SR-MCTS overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms, enabling a more efficient exploration of solution spaces. To guide the search process, we propose the Pairwise Preference Reward Model (PPRM), which predicts pairwise preferences between solutions through instruction-following capabilities trained by Reinforcement Learning from Human Feedback (RLHF). Finally, the Enhanced Borda Count (EBC) method is adopted to synthesize pairwise preferences into global quantile scores for evaluations. This approach mitigates the challenges of scoring variability and non-independent distributions in mathematical reasoning tasks. The framework has been tested on general and advanced benchmarks, showing superior search efficiency and performance compared to existing open-source and closed-source methods, particularly in complex Olympiad-level benchmarks, including AIME24 and AMC23.
2022
SPE: Symmetrical Prompt Enhancement for Fact Probing
Yiyuan Li | Tong Che | Yezhen Wang | Zhengbao Jiang | Caiming Xiong | Snigdha Chaturvedi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Yiyuan Li | Tong Che | Yezhen Wang | Zhengbao Jiang | Caiming Xiong | Snigdha Chaturvedi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Pretrained language models (PLMs) have been shown to accumulate factual knowledge during pretraining (Petroni et al. 2019). Recent works probe PLMs for the extent of this knowledge through prompts either in discrete or continuous forms. However, these methods do not consider symmetry of the task: object prediction and subject prediction. In this work, we propose Symmetrical Prompt Enhancement (SPE), a continuous prompt-based method for factual probing in PLMs that leverages the symmetry of the task by constructing symmetrical prompts for subject and object prediction. Our results on a popular factual probing dataset, LAMA, show significant improvement of SPE over previous probing methods.