2025
pdf
bib
abs
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
Xinbei Ma
|
Yiting Wang
|
Yao Yao
|
Tongxin Yuan
|
Aston Zhang
|
Zhuosheng Zhang
|
Hai Zhao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper investigates the faithfulness of multimodal large language model (MLLM) agents in a graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context. A general scenario is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. A wide range of MLLMs are evaluated as GUI agents using a simulated dataset, following three working patterns with different levels of perception. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions. While recent studies predominantly focus on the helpfulness of agents, our findings first indicate that these agents are prone to environmental distractions. Furthermore, we implement an adversarial environment injection and analyze the approach to improve faithfulness, calling for a collective focus on this important topic.
pdf
bib
abs
LESA: Learnable LLM Layer Scaling-Up
Yifei Yang
|
Zouying Cao
|
Xinbei Ma
|
Yao Yao
|
Zhi Chen
|
Libo Qin
|
Hai Zhao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose LESA, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines, achieving superior performance with less than half the computational cost during continual pre-training. Extensive analyses demonstrate its effectiveness across different model sizes and tasks.
2024
pdf
bib
abs
SirLLM: Streaming Infinite Retentive LLM
Yao Yao
|
Zuchao Li
|
Hai Zhao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As Large Language Models (LLMs) become increasingly prevalent in various domains, their ability to process inputs of any length and maintain a degree of memory becomes essential. However, the one-off input of overly long texts is limited, as studies have shown that when input lengths exceed the LLMs’ pre-trained text length, there is a dramatic decline in text generation capabilities. Moreover, simply extending the length of pre-training texts is impractical due to the difficulty in obtaining long text data and the substantial memory consumption costs this would entail for LLMs. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model’s long-term memory capabilities.Motivated by this challenge, we introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues without the need for fine-tuning. SirLLM utilizes the Token Entropy metric and a memory decay mechanism to filter key phrases, endowing LLMs with both long-lasting and flexible memory. We designed three distinct tasks and constructed three datasets to measure the effectiveness of SirLLM from various angles: (1) DailyDialog; (2) Grocery Shopping; (3) Rock-Paper-Scissors. Our experimental results robustly demonstrate that SirLLM can achieve stable and significant improvements across different LLMs and tasks, compellingly proving its effectiveness. When having a coversation, “A sir could forget himself,” but SirLLM never does! Our code is publicly available at https://github.com/Zoeyyao27/SirLLMhttps://github.com/Zoeyyao27/SirLLM
pdf
bib
abs
GoT: Effective Graph-of-Thought Reasoning in Language Models
Yao Yao
|
Zuchao Li
|
Hai Zhao
Findings of the Association for Computational Linguistics: NAACL 2024
With the widespread use of language models (LMs) in NLP tasks, researchers have discovered the potential of Chain-of-thought (CoT) to assist LMs in accomplishing complex reasoning tasks by generating intermediate steps. However, human thought processes are often non-linear, rather than simply sequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph. By representing thought units as nodes and connections between them as edges, our approach captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes. GoT adopts a two-stage framework with an additional GoT encoder for thought graph representation and fuses the graph representation with the original input representation through a gated fusion mechanism. We evaluate GoT’s performance on a text-only reasoning task (AQUA-RAT) and a multimodal reasoning task (ScienceQA). Our model achieves significant improvement over the strong CoT baseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59% using the T5-base model over the state-of-the-art Multimodal-CoT on the ScienceQA test set. Our code is publicly available at https://github.com/Zoeyyao27/Graph-of-Thought
pdf
bib
abs
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment
Yao Yao
|
Zuchao Li
|
Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2024
The burgeoning size of Large Language Models (LLMs) has led to enhanced capabilities in generating responses, albeit at the expense of increased inference times and elevated resource demands. Existing methods of acceleration, predominantly hinged on knowledge distillation, generally necessitate fine-tuning of considerably large models, such as Llama-7B, posing a challenge for average users. Furthermore, present techniques for expediting inference and reducing costs operate independently. To address these issues, we introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework. This approach leverages a larger LLM as a ”teacher” to create guidance prompts, paired with a smaller ”student” model to finalize responses. Remarkably, GKT requires no fine-tuning and doesn’t necessitate the teacher and student models to have the same vocabulary, allowing for extensive batch generation to accelerate the process while ensuring user customization. GKT can be seamlessly integrated into cloud-edge collaboration architectures, and is versatile enough for plug-and-play application across various models. It excels in both efficiency and affordability, epitomizing a ”cheap and cheerful” solution. GKT achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA. When utilizing ChatGPT as teacher model and Llama2-70B as the student model, we can achieve 95.00% of ChatGPT’s performance at 52% of the cost. The results highlight substantial enhancements in accuracy and processing speed on the GSM8K and CSQA datasets, surpassing the performance of using either the student or teacher models in isolation.
2023
pdf
bib
abs
Learning Event-aware Measures for Event Coreference Resolution
Yao Yao
|
Zuchao Li
|
Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2023
Researchers are witnessing knowledge-inspired natural language processing shifts the focus from entity-level to event-level, whereas event coreference resolution is one of the core challenges. This paper proposes a novel model for within-document event coreference resolution. On the basis of event but not entity as before, our model learns and integrates multiple representations from both event alone and event pair. For the former, we introduce multiple linguistics-motivated event alone features for more discriminative event representations. For the latter, we consider multiple similarity measures to capture the distinction of event pair. Our proposed model achieves new state-of-the-art on the ACE 2005 benchmark, demonstrating the effectiveness of our proposed framework.
2018
pdf
bib
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Stephen Politzer-Ahles
|
Yu-Yin Hsu
|
Chu-Ren Huang
|
Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
pdf
bib
Changing against tone merging trends in community? The case of C. Y. Leung
Ziqi Chen
|
Yao Yao
|
Alan C. L. Yu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
pdf
bib
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 25th Joint Workshop on Linguistics and Language Processing
Stephen Politzer-Ahles
|
Yu-Yin Hsu
|
Chu-Ren Huang
|
Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 25th Joint Workshop on Linguistics and Language Processing
pdf
bib
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
Stephen Politzer-Ahles
|
Yu-Yin Hsu
|
Chu-Ren Huang
|
Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
2017
pdf
bib
Multi-dimensional Meanings of Subjective Adverbs - Case Study of Mandarin Chinese Adverb Pianpian
Mi Zhou
|
Yao Yao
|
Chu-Ren Huang
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
2015
pdf
bib
Create a Manual Chinese Word Segmentation Dataset Using Crowdsourcing Method
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing
pdf
bib
A Review of Corpus-based Statistical Models of Language Variation
Yao Yao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
pdf
bib
Mechanical Turk-based Experiment vs Laboratory-based Experiment: A Case Study on the Comparison of Semantic Transparency Rating Data
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
2014
pdf
bib
Exploring Mental Lexicon in an Efficient and Economic Way: Crowdsourcing Method for Linguistic Experiments
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)
pdf
bib
Building a Semantic Transparency Dataset of Chinese Nominal Compounds: A Practice of Crowdsourcing Methodology
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
pdf
bib
Predicting the Use of BA construction in Mandarin Chinese Discourse: A Modeling Study with Two Verbs
Yao Yao
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing
2010
pdf
bib
A Working Report on Statistically Modeling Dative Variation in Mandarin Chinese
Yao Yao
|
Feng-hsi Liu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)