Zichen Liu
2026
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Zijian Wu | Jinjie Ni | Xiangyan Liu | Zichen Liu | Hang Yan | Michael Qizhe Shieh
Findings of the Association for Computational Linguistics: ACL 2026
Zijian Wu | Jinjie Ni | Xiangyan Liu | Zichen Liu | Hang Yan | Michael Qizhe Shieh
Findings of the Association for Computational Linguistics: ACL 2026
Vision-language models (VLMs) trained via reinforcement learning with verifiable reward (RLVR) have shown notable progress in scaling test-time compute effectively. In this work, we investigate how synthesized RL data can further improve RLVR. To this end, we propose SynthRL—a scalable and guaranteed pipeline for automatic data scaling in reasoning-oriented RL training. SynthRL comprises three key stages: (1) selecting seed questions with appropriate distribution, (2) augmenting them into more challenging variants while preserving the original answers, and (3) a guaranteed verification stage that ensures near-perfect correctness and difficulty enhancement. Our empirical experiments demonstrate SynthRL’s scalability and effectiveness. When applied to the MMK12 dataset, SynthRL synthesizes over 3.3K additional verifiable, challenging questions from approximately 8K seed samples. Models trained with our synthesized data achieve consistent gains across five out-of-domain visual math reasoning benchmarks, with a significant improvement over baseline models trained on seed data alone. Notably, detailed analysis reveals that the gains are more pronounced on the most challenging evaluation samples, highlighting SynthRL’s effectiveness in eliciting deeper and more complex reasoning patterns.
Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents
Minzheng Wang | Run Luo | Yanbo Wang | Zichen Liu | Yuqiao Tan | Tao Tan | Nan Xu | Lu Wang | Wenji Mao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Minzheng Wang | Run Luo | Yanbo Wang | Zichen Liu | Yuqiao Tan | Tao Tan | Nan Xu | Lu Wang | Wenji Mao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for closed-ended tasks, extending it to open-ended social language games via self-play reveals a critical issue: evolution impasse. Due to the vast strategy space, language agents frequently converge to homogenized behaviors, leading to deterministic match outcomes that eliminate the gradient signals necessary for policy evolution. To tackle this issue, we propose Dual-scale Evolutionary Policy Training (DEPT) for social language games. DEPT introduces a time-scaled evolutionary perception mechanism that detects impasse by quantifying dual-scale value baseline divergence alongside match entropy. Upon perceiving the collapse, it then activates asymmetric advantage reshaping to dynamically modulate the optimization landscape for intervention. Thus, our method effectively restores gradient signals and enforces sustained strategic exploration. Extensive experiments on multiple social language games demonstrate that DEPT outperforms strong baselines, avoiding policy degeneration and driving the continuous evolution of social language agents.
2022
TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding
Zichen Liu | Xuyuan Liu | Yanlong Wen | Guoqing Zhao | Fen Xia | Xiaojie Yuan
Proceedings of the 29th International Conference on Computational Linguistics
Zichen Liu | Xuyuan Liu | Yanlong Wen | Guoqing Zhao | Fen Xia | Xiaojie Yuan
Proceedings of the 29th International Conference on Computational Linguistics
ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to improve the effectiveness and efficiency of manual coding, many methods have been proposed to automatically predict ICD codes from clinical notes. However, most previous works ignore the decisive information contained in structured medical data in EHRs, which is hard to be captured from the noisy clinical notes. In this paper, we propose a Tree-enhanced Multimodal Attention Network (TreeMAN) to fuse tabular features and textual features into multimodal representations by enhancing the text representations with tree-based features via the attention mechanism. Tree-based features are constructed according to decision trees learned from structured multimodal medical data, which capture the decisive information about ICD coding. We can apply the same multi-label classifier from previous text models to the multimodal representations to predict ICD codes. Experiments on two MIMIC datasets show that our method outperforms prior state-of-the-art ICD coding approaches. The code is available at https://github.com/liu-zichen/TreeMAN.
2021
TEMP: Taxonomy Expansion with Dynamic Margin Loss through Taxonomy-Paths
Zichen Liu | Hongyuan Xu | Yanlong Wen | Ning Jiang | HaiYing Wu | Xiaojie Yuan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Zichen Liu | Hongyuan Xu | Yanlong Wen | Ning Jiang | HaiYing Wu | Xiaojie Yuan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
As an essential form of knowledge representation, taxonomies are widely used in various downstream natural language processing tasks. However, with the continuously rising of new concepts, many existing taxonomies are unable to maintain coverage by manual expansion. In this paper, we propose TEMP, a self-supervised taxonomy expansion method, which predicts the position of new concepts by ranking the generated taxonomy-paths. For the first time, TEMP employs pre-trained contextual encoders in taxonomy construction and hypernym detection problems. Experiments prove that pre-trained contextual embeddings are able to capture hypernym-hyponym relations. To learn more detailed differences between taxonomy-paths, we train the model with dynamic margin loss by a novel dynamic margin function. Extensive evaluations exhibit that TEMP outperforms prior state-of-the-art taxonomy expansion approaches by 14.3% in accuracy and 15.8% in mean reciprocal rank on three public benchmarks.