Tao Ren
Papers on this page may belong to the following people: Tao Ren, Tao Ren (Pittsburgh)
2026
TreeDiff: AST-Guided Code Generation with Diffusion LLMs
Yiming Zeng | Jinghan Cao | Zexin Li | Yiming Chen | Tao Ren | Zhuochun Li | Dawei Xiang | Xidong Wu | Shangqian Gao | Tingting Yu
Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026)
Yiming Zeng | Jinghan Cao | Zexin Li | Yiming Chen | Tao Ren | Zhuochun Li | Dawei Xiang | Xidong Wu | Shangqian Gao | Tingting Yu
Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026)
Code generation is increasingly critical for real-world applications. Still, diffusion-based large language models continue to struggle with this demand. Unlike free-form text, code requires syntactic precision; even minor structural inconsistencies can render a program non-executable. Existing diffusion-based large language models rely on random token masking for corruption, leading to two key failures: they lack awareness of syntactic boundaries during the iterative denoising process, and they fail to capture the long-range hierarchical dependencies essential for program correctness.We propose TreeDiff to address both issues. Specifically, we propose a syntax-aware diffusion framework that incorporates structural priors from Abstract Syntax Tree (AST) into the corruption process. Instead of masking individual tokens at random, we selectively mask tokens belonging to key AST nodes. By aligning the corruption process with the underlying structure of code, our method encourages the model to internalize the compositional nature of programming languages, enabling it to reconstruct programs that respect grammatical boundaries and capture long-range dependencies. Our method achieves a 13.3% relative improvement over the random masking training method, demonstrating its effectiveness in code generation task by leveraging underlying structures.
2025
LEAF: Large Language Diffusion Model for Time Series Forecasting
Yuhang Pei | Tao Ren | Yifan Wang | Zhipeng Sun | Wei Ju | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuhang Pei | Tao Ren | Yifan Wang | Zhipeng Sun | Wei Ju | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
This paper studies the problem of time series forecasting, which aims to generate future predictions given historical trajectories. Recent researchers have applied large language models (LLMs) into time series forecasting, which usually align the time series space with textual space and output future predictions with strong autoregressive reasoning abilities. Despite their remarkable progress, these approaches usually lack an understanding of holistic temporal patterns with potential error accumulation. Towards this end, this paper proposes a simple yet effective framework that marries ̲Larg ̲e Langu ̲age Diffusion Model with time series ̲forecasting (LEAF). The core of our framework is to generate future predictions with a diffusion model from a holistic view. In particular, we first introduce a tokenization module to convert time series into tokens and then adopt the language diffusion models to capture the temporal dependencies. In this way, we can transform masked time series into all the predictions with the remasking strategy. Extensive experiments on various benchmark datasets validate the effectiveness of the proposed LEAF in comparison to various baselines.
CateEA: Enhancing Entity Alignment via Implicit Category Supervision
Guan Dong Feng | Tao Ren | Jun Hu | Dan dan Wang
Proceedings of the 31st International Conference on Computational Linguistics
Guan Dong Feng | Tao Ren | Jun Hu | Dan dan Wang
Proceedings of the 31st International Conference on Computational Linguistics
Entity Alignment (EA) is essential for integrating Knowledge Graphs (KGs) by matching equivalent entities across diverse KGs. With the rise of multi-modal KGs, which emerged to better depict real-world KGs by integrating visual, textual, and structured data, Multi-Modal Entity Alignment (MMEA) has become crucial in enhancing EA. However, existing MMEA methods often neglect the inherent semantic category information of entities, limiting alignment precision and robustness. To address this, we propose Category-enhanced Entity Alignment (CateEA), which combines implicit entity category information into multi-modal representations. By generating pseudo-category labels from entity embeddings and integrating them into a multi-task learning framework, CateEA captures latent category semantics, enhancing entity representations. CateEA allows for adaptive adjustments of similarity measures, leading to improved alignment precision and robustness in multi-modal contexts. Experiments on benchmark datasets demonstrate that CateEA outperforms state-of-the-art methods in various settings.