Xinyi Dong


2025

pdf bib
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection
Jiaqi Li | Xinyi Dong | Yang Liu | Zhizhuo Yang | Quansen Wang | Xiaobo Wang | Song-Chun Zhu | Zixia Jia | Zilong Zheng
Findings of the Association for Computational Linguistics: ACL 2025

We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning. This process iteratively generates self-reflection for self-training, fostering a continuous and self-evolving process. Leveraging this pipeline, we construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks. Building upon this dataset, we demonstrate the effectiveness of reflection learning to improve SLMs’ reasoning abilities using SFT and DPO with remarkable performance, substantially boosting Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1%. It validates that ReflectEvo can rival or even surpass the reasoning capability of the three prominent open-sourced models on BIG-bench without distillation from superior models or fine-grained human annotation. We further conduct a deeper analysis of the high quality of self-generated reflections and their impact on error localization and correction. Our work highlights the potential of continuously enhancing the reasoning performance of SLMs through iterative reflection learning in the long run.

pdf bib
Discovering Semantic Subdimensions through Disentangled Conceptual Representations
Yunhao Zhang | Shaonan Wang | Nan Lin | Xinyi Dong | Chong Li | Chengqing Zong
Findings of the Association for Computational Linguistics: EMNLP 2025

Understanding the core dimensions of conceptual semantics is fundamental to uncovering how meaning is organized in language and the brain. Existing approaches often rely on predefined semantic dimensions that offer only broad representations, overlooking finer conceptual distinctions. This paper proposes a novel framework to investigate the subdimensions underlying coarse-grained semantic dimensions. Specifically, we introduce a Disentangled Continuous Semantic Representation Model (DCSRM) that decomposes word embeddings from large language models into multiple sub-embeddings, each encoding specific semantic information. Using these subembeddings, we identify a set of interpretable semantic subdimensions. To assess their neural plausibility, we apply voxel-wise encoding models to map these subdimensions to brain activation. Our work offers more fine-grained interpretable semantic subdimensions of conceptual meaning. Further analyses reveal that semantic dimensions are structured according to distinct principles, with polarity emerging as a key factor driving their decomposition into subdimensions. The neural correlates of the identified subdimensions support their cognitive and neuroscientific plausibility.

2023

pdf bib
A Comprehensive Neural and Behavioral Task Taxonomy Method for Transfer Learning in NLP
Yunhao Zhang | Chong Li | Xiaohan Zhang | Xinyi Dong | Shaonan Wang
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)