Xu Guo
Other people with similar names: Xu Guo
Unverified author pages with similar names: Xu Guo
2025
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
Yige Xu | Xu Guo | Zhiwei Zeng | Chunyan Miao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yige Xu | Xu Guo | Zhiwei Zeng | Chunyan Miao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token decoding, which constrains reasoning within the discrete vocabulary space and may not always be optimal. While recent efforts explore continuous-space reasoning, they often require full-model fine-tuning and suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the LLM. Specifically, we employ a lightweight fixed assistant model to speculatively generate instance-specific soft thought tokens as the initial chain of thoughts, which are then mapped into the LLM’s representation space via a trainable projection module. Experimental results on five reasoning benchmarks demonstrate that our method enhances LLM reasoning performance through supervised, parameter-efficient fine-tuning. Source code is available at https://github.com/xuyige/SoftCoT.
Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
Colin Hong | Xu Guo | Anand Chaanan Singh | Esha Choukse | Dmitrii Ustiugov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Colin Hong | Xu Guo | Anand Chaanan Singh | Esha Choukse | Dmitrii Ustiugov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior attempts to accelerate SC mainly rely on model-based confidence scores or heuristics with limited empirical support. For the first time, we theoretically and empirically analyze the inefficiencies of SC and reveal actionable opportunities for improvement. Building on these insights, we propose Slim-SC, a step-wise pruning strategy that identifies and removes redundant chains using inter-chain similarity at the thought level.Experiments on three STEM reasoning datasets and two recent LLM architectures show that Slim-SC reduces inference latency and KVC usage by up to 45% and 26%, respectively, with R1-Distill, while maintaining or improving accuracy, thus offering a simple yet efficient TTS alternative for SC.
2024
RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference
Yige Xu | Xu Guo | Zhiwei Zeng | Chunyan Miao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yige Xu | Xu Guo | Zhiwei Zeng | Chunyan Miao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have brought a great breakthrough to the natural language processing (NLP) community, while leading the challenge of handling concurrent customer queries due to their high throughput demands. Data multiplexing addresses this by merging multiple inputs into a single composite input, allowing more efficient inference through a shared forward pass. However, as distinguishing individuals from a composite input is challenging, conventional methods typically require training the entire backbone, yet still suffer from performance degradation. In this paper, we introduce RevMUX, a parameter-efficient data multiplexing framework that incorporates a reversible design in the multiplexer, which can be reused by the demultiplexer to perform reverse operations and restore individual samples for classification. Extensive experiments on four datasets and three types of LLM backbones demonstrate the effectiveness of RevMUX for enhancing LLM inference efficiency while retaining a satisfactory classification performance.
A Survey on Natural Language Counterfactual Generation
Yongjie Wang | Xiaoqi Qiu | Yu Yue | Xu Guo | Zhiwei Zeng | Yuhong Feng | Zhiqi Shen
Findings of the Association for Computational Linguistics: EMNLP 2024
Yongjie Wang | Xiaoqi Qiu | Yu Yue | Xu Guo | Zhiwei Zeng | Yuhong Feng | Zhiqi Shen
Findings of the Association for Computational Linguistics: EMNLP 2024
Natural language counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class. The generated counterfactuals provide insight into the reasoning behind a model’s predictions by highlighting which words significantly influence the outcomes. Additionally, they can be used to detect model fairness issues and augment the training data to enhance the model’s robustness. A substantial amount of research has been conducted to generate counterfactuals for various NLP tasks, employing different models and methodologies. With the rapid growth of studies in this field, a systematic review is crucial to guide future researchers and developers. To bridge this gap, this survey provides a comprehensive overview of textual counterfactual generation methods, particularly those based on Large Language Models. We propose a new taxonomy that systematically categorizes the generation methods into four groups and summarizes the metrics for evaluating the generation quality. Finally, we discuss ongoing research challenges and outline promising directions for future work.