2025
pdf
bib
abs
You Only Query Twice: Multimodal Rumor Detection via Evidential Evaluation from Dual Perspectives
Junyi Chen
|
Leyuan Liu
|
Tian Lan
|
Fan Zhou
|
Xiaosong Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Current rumor detectors exhibit limitations in fully exploiting responses to the source tweet as essential public opinions, and in explaining and indicating the reliability of the results obtained. Additionally, the joint utilization of both responses and the multimodal source content for detection presents challenges due to the heterogeneous nature of the data points. In this work, to address the first challenge, we initially prompt the Large Language Model (LLM) with both multimodal source content and the corresponding response set to extract contrasting evidence to enable maximal utilization of informative responses. To overcome the second challenge, we introduce an uncertainty-aware evidential evaluator to assess the evidence intensity from the multimodal source content and dual-sided reasoning, from which the final prediction is derived. As we model the second-order probability, we can effectively indicate the model’s uncertainty (i.e., the reliability) of the results. The reasoning from the correct perspective also serves as a natural language-based explanation. To this end, the third challenge is also addressed as we fully leverage the available resources. Extensive experiments validate the effectiveness, uncertainty awareness in predictions, helpful explainability for human judgment, and superior efficiency of our approach compared to contemporary works utilizing LLMs.
pdf
bib
abs
LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback
Thai Hoang
|
Kung-Hsiang Huang
|
Shirley Kokane
|
Jianguo Zhang
|
Zuxin Liu
|
Ming Zhu
|
Jake Grigsby
|
Tian Lan
|
Michael S Ryoo
|
Chien-Sheng Wu
|
Shelby Heinecke
|
Huan Wang
|
Silvio Savarese
|
Caiming Xiong
|
Juan Carlos Niebles
Findings of the Association for Computational Linguistics: ACL 2025
Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data, especially for multi-steps tasks that involve planning, executing tool calls, and responding to feedback. To address these issues, we present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback. Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback. This setup enables LLM Agents to explore and solve tasks autonomously, facilitating the discovery of multiple approaches to tackle any given task. The resulting action trajectory data are then used to create high-quality training datasets for LAMs. Our experiments on popular agentic benchmarks, ToolBench and CRMArena, highlight the effectiveness of LAM SIMULATOR: models trained with self-generated datasets using our framework achieve significant performance gains, up to a 49.3% improvement over their original baselines. LAM SIMULATOR requires minimal human input during dataset creation, highlighting LAM SIMULATOR’s efficiency and effectiveness in speeding up development of AI agents.
pdf
bib
abs
Attention Consistency for LLMs Explanation
Tian Lan
|
Jinyuan Xu
|
Xue He
|
Jenq-Neng Hwang
|
Lei Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment, however, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the Multi-Layer Attention Consistency Score (MACS), a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22% decrease in VRAM usage and 30% reduction in latency.
pdf
bib
abs
xLAM: A Family of Large Action Models to Empower AI Agent Systems
Jianguo Zhang
|
Tian Lan
|
Ming Zhu
|
Zuxin Liu
|
Thai Hoang
|
Shirley Kokane
|
Weiran Yao
|
Juntao Tan
|
Zhiwei Liu
|
Yihao Feng
|
Juan Carlos Niebles
|
Shelby Heinecke
|
Huan Wang
|
Silvio Savarese
|
Caiming Xiong
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce xLAM, a series of large action models designed for AI agent tasks. The xLAM series includes five models with both dense and mixture-of-expert architectures, ranging from 1B to 8x22B parameters, trained using a scalable, flexible pipeline that unifies, augments, and synthesizes diverse datasets to enhance AI agents’ generalizability and performance across varied environments. Our experimental results demonstrate that xLAM consistently delivers exceptional performance across multiple agent ability benchmarks, notably securing the 1st position on the Berkeley Function-Calling Leaderboard, outperforming GPT-4, Claude-3, and many other models in terms of tool use. By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks.
2024
pdf
bib
abs
生成式文本质量的自动评估方法综述(A Survey of Automatic Evaluation on the Quality of Generated Text)
Tian Lan (兰天)
|
Ziao Ma (马梓奥)
|
Yanghao Zhou (周杨浩)
|
Chen Xu (徐晨)
|
Xianling Mao (毛先领)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
“人工评估,作为生成式文本质量评价的金标准,成本太高;自动评估,核心思想在于要使其评估结果与人工评估高度相关,从而实现对生成式文本质量的自动化分析和评价。随着自然语言处理领域相关技术的迭代进步,使得生成式文本质量的自动评估技术,已然经历了多次技术范式的迭代。然而,学界至今依然缺乏对生成式文本质量自动评估技术的系统化总结。因此,本文将首先系统地对已有的生成式文本自动评估方法进行归纳总结,然后分析了生成式文本自动评估方法的主要发展趋势,最后为了使读者更加宏观地了解自动评估整体,对自动评估领域整体的未来研究方向进行了探讨和展望。”
pdf
bib
abs
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Zhiwei Liu
|
Weiran Yao
|
Jianguo Zhang
|
Rithesh Murthy
|
Liangwei Yang
|
Zuxin Liu
|
Tian Lan
|
Ming Zhu
|
Juntao Tan
|
Shirley Kokane
|
Thai Hoang
|
Juan Carlos Niebles
|
Shelby Heinecke
|
Huan Wang
|
Silvio Savarese
|
Caiming Xiong
Proceedings of the 28th Conference on Computational Natural Language Learning
We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly.We investigate the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, we developed two RPO methods, RPO-Traj and RPO-Batch, to adapt to different settings.Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, can effectively learn and apply action principles to enhance performance.
2023
pdf
bib
abs
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
|
Tian Lan
|
Huayang Li
|
Jialu Xu
|
Yan Wang
|
Deng Cai
Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!
We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do.
2021
pdf
bib
abs
ISTIC’s Triangular Machine Translation System for WMT2021
Hangcheng Guo
|
Wenbin Liu
|
Yanqing He
|
Tian Lan
|
Hongjiao Xu
|
Zhenfeng Wu
|
You Pan
Proceedings of the Sixth Conference on Machine Translation
This paper describes the ISTIC’s submission to the Triangular Machine Translation Task of Russian-to-Chinese machine translation for WMT’ 2021. In order to fully utilize the provided corpora and promote the translation performance from Russian to Chinese, the pivot method is used in our system which pipelines the Russian-to-English translator and the English-to-Chinese translator to form a Russian-to-Chinese translator. Our system is based on the Transformer architecture and several effective strategies are adopted to improve the quality of translation, including corpus filtering, data pre-processing, system combination and model ensemble.