Tao Ren

Other people with similar names: Tao Ren (Pittsburgh)

Unverified author pages with similar names: Tao Ren

2026

Large Language Models (LLMs) have shown impressive reasoning capabilities in agents for complex interactive environments. However, these agents often suffer from hallucinations and lack grounding, leading to unreliable actions that conflict with real-world constraints. Existing approaches mitigate some issues through implicit imitation or sparse reinforcement learning but rely on fitting data distributions without explicitly understanding environmental constraints, often generating actions that are behaviorally distorted or environmentally impermissible. To address this, we introduce OntoGuard, an ontological framework designed to guard LLM agents by enforcing environmental and behavioral admissibility. These constraints are constructed by extracting knowledge from oracle demonstrations, supplemented with world knowledge inherent in LLMs and general knowledge bases. During inference, OntoGuard functions as an active interceptor, using a graph-based constraint-checking mechanism to reject invalid actions and prompt self-correction before acting. Experiments on both ScienceWorld and VirtualHome demonstrate OntoGuard’s advantage over state-of-the-art methods, validating its ability to enforce physical and behavioral constraints while preventing invalid actions.

pdf bib abs

EdgeFormer: Latency-Aware Collaborative Multi-Head Attention of Transformer Inference in Edge Networks
Yiming Yao | Jianwei Niu | Bin Dai | Tao Ren
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent breakthroughs in Transformer-based large models, have driven widespread tasks, yet their reliance on centralized cloud deployment raises significant privacy risks due to sensitive data exposure. While edge-based collaborative inference offers a privacy-preserving alternative, existing methods face critical limitations: static model partitioning cannot adapt to dynamic edge resource fluctuations, and rigid multi-head attention handling overlooks semantic-critical prioritization and parallelism. We propose EdgeFormer, a latency-aware framework for distributed Transformer inference in resource-constrained edge networks. EdgeFormer dynamically allocates model blocks across devices via efficiency-storage trade-off optimization and introduces collaborative Multi-Head Attention (cMHA), which distributes semantic-critical attention heads across devices while pruning redundant ones under real-time constraints. We further develop LiScore, a composite metric integrating attention diversity and latency costs, alongside a similarity-based retrieval method to reduce recomputation overhead. Extensive experiments demonstrate that EdgeFormer achieves up to 2.01 \\times inference acceleration over state-of-the-art baselines with \\leq1.06% accuracy loss, maintaining robustness under varying edge conditions.

2025

pdf bib abs

Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion
Guoliang Zhu | Tao Ren | Dandan Wang | Jun Hu
Findings of the Association for Computational Linguistics: NAACL 2025

Multimodal knowledge graph completion (MKGC) aims to predict missing triples in MKGs using multimodal information. Recent research typically either extracts information from each modality separately to predict, then ensembles the predictions at the decision stage, or projects multiple modalities into a unified feature space to learn multimodal representations for prediction. However, these methods usually overlook the intrinsic correlation between modalities in MKGs which should be leveraged in both unimodal knowledge extraction and multimodal knowledge fusion. Motivated by this, we propose a noval Modal-collaborative knowledge learning (Moodle) framework for MKGC, the key idea of which is to foster mutual guidance and collaboration during unimodal knowledge extraction, to let each modality acquire distinct and complementary knowledge that subsequently enhances the multimodal knowledge fusion. Specifically, Moodle preserves the representations of different modalities to learn unimodal knowledge while modeling the mutual guidance through multi-task learning. Furthermore, Moodle performs multimodal knowledge fusion and prediction guided by unimodal knowledge, capturing their synergistic relationships and acquire fine-grained semantic knowledge through contrastive learning. Extensive experiments on three real-world datasets demonstrate the advantages of Moodle over state-of-the-art methods.

Co-authors

Venues

Findings2
ACL1

Fix author