Zhuoyun Li

2026

Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
Boxuan Wang | Zhuoyun Li | Xinmiao Huang | Xiaowei Huang | Yi Dong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper primarily demonstrates a method to quantitatively assess the alignment between multi-step, structured reasoning in large language models and human preferences. We introduce the Alignment Score, a semantic-level metric that compares a model-produced chain of thought traces with a human-preferred reference by constructing semantic-entropy-based matrices over intermediate steps and measuring their divergence. Our analysis shows that Alignment Score tracks task accuracy across models and hop depths, and peaks at 2-hop reasoning. Empirical results further indicate that misalignment at greater reasoning depths is driven mainly by alignment errors such as thematic shift and redundant reasoning. Viewing chain sampling as drawing from a distribution over reasoning paths, we empirically demonstrate a strong and consistent correlation between Alignment Score and accuracy, readability, and coherence, supporting its use as a diagnostic signal. The code is available.

pdf bib abs

Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token
Ailiang Lin | Zhuoyun Li | Yusong Wang | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Decoder-only large language models (LLMs) have been increasingly adopted to build embedding models for diverse tasks. To overcome the inherent limitations of causal attention in representation learning, many existing methods modify the attention mechanism to be bidirectional, potentially undermining LLMs’ ability to extract semantic information acquired during pre-training. Meanwhile, leading unidirectional approaches often rely on extra input text to generate contextualized embeddings, inevitably increasing computational costs. In this work, we propose Causal2Vec, a general-purpose embedding model tailored to enhance the performance of decoder-only LLMs without altering their original architectures or introducing significant computational overhead. Specifically, we first employ a lightweight BERT-style model to pre-encode the input text into a single Contextual token, which is then prepended to the LLM’s input sequence, allowing each token to capture contextualized information even without attending to future tokens. Furthermore, to mitigate the recency bias introduced by last-token pooling, we concatenate the last hidden states of Contextual and EOS tokens as the final text embedding. In practice, Causal2Vec achieves a new state-of-the-art performance on the MTEB benchmark among models trained solely on publicly available retrieval datasets.

Co-authors

Manabu Okumura 1

Boxuan Wang 1

Yusong Wang 1

Venues

ACL2

Fix author