Kaixuan Ren
2025
Alignment of Large Language Models with Human Preferences and Values
Usman Naseem
|
Gautam Siddharth Kashyap
|
Kaixuan Ren
|
Yiran Zhang
|
Utsav Maskey
|
Juan Ren
|
Afrozah Nadeem
Proceedings of The 23rd Annual Workshop of the Australasian Language Technology Association
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their reliability and alignment with human expectations remain unresolved challenges. This tutorial introduces the foundations of alignment and provides participants with a conceptual and practical understanding of the field. Core principles such as values, safety, reasoning, and pluralism will be presented through intuitive explanations, worked examples, and case studies. The aim is to equip attendees with the ability to reason about alignment goals, understand how existing methods operate in practice, and critically evaluate their strengths and limitations.
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
Yiran Zhang
|
Mo Wang
|
Xiaoyang Li
|
Kaixuan Ren
|
Chencheng Zhu
|
Usman Naseem
Findings of the Association for Computational Linguistics: EMNLP 2025
Despite impressive advances in large language models (LLMs), existing benchmarks often focus on single-turn or single-step tasks, failing to capture the kind of iterative reasoning required in real-world settings. To address this limitation, we introduce **TurnBench**, a novel benchmark that evaluates multi-turn, multi-step reasoning through an interactive code-breaking task inspired by the “Turing Machine Board Game.” In each episode, a model must uncover hidden logical or arithmetic rules by making sequential guesses, receiving structured feedback, and integrating clues across multiple rounds. This dynamic setup requires models to reason over time, adapt based on past information, and maintain consistency across steps—capabilities underexplored in current benchmarks. TurnBench includes two modes: *Classic*, which tests standard reasoning, and *Nightmare*, which introduces increased complexity and requires robust inferential chains. To support fine-grained analysis, we provide ground-truth annotations for intermediate reasoning steps. Our evaluation of state-of-the-art LLMs reveals significant gaps: the best model achieves 84% accuracy in Classic mode, but performance drops to 18% in Nightmare mode. In contrast, human participants achieve 100% in both, underscoring the challenge TurnBench poses to current models. By incorporating feedback loops and hiding task rules, TurnBench reduces contamination risks and provides a rigorous testbed for diagnosing and advancing multi-step, multi-turn reasoning in LLMs.
Search
Fix author
Co-authors
- Usman Naseem 2
- Yiran Zhang 2
- Gautam Siddharth Kashyap 1
- Xiaoyang Li 1
- Utsav Maskey 1
- show all...