Yanan Chen

2025

pdf bib abs
VestaBench: An Embodied Benchmark for Safe Long-Horizon Planning Under Multi-Constraint and Adversarial Settings
Tanmana Sadhu | Yanan Chen | Ali Pesaranghader
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large language models (LLMs) are applied to reasoning and (automated) planning across diverse domains, from travel itineraries to embodied AI tasks. However, concerns have been raised about their suitability for long-horizon tasks involving multiple constraints, as they are prone to hallucinations, particularly in adversarial scenarios. Safety reasoning also becomes critical for embodied AI agents, which interact with their physical environments to complete tasks on behalf of humans. However, existing (safety) benchmarks fail to represent a diverse range of multi-constraint tasks that require long-horizon planning with a focus on safety. To address this, we propose VESTABENCH, a benchmark curated using VirtualHome and BEHAVIOR-100. Our VESTABENCH includes (1) tasks that can be achieved safely under adversarial and multi-constraint settings, as well as (2) adversarial instructions that the agent must avoid. Our experiments with state-of-the-art LLM-based baselines reveal that they perform poorly against our tasks, not only achieving low success rates but also suffering significantly compromised safety outcomes. This observation reinforces the limitations of LLMs in generating safe plans when faced with adversarial settings or instructions. Finally, we believe that our findings benefit the research and industry communities.

2024

pdf bib abs
Athena: Safe Autonomous Agents with Verbal Contrastive Learning
Tanmana Sadhu | Ali Pesaranghader | Yanan Chen | Dong Hoon Yi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.

2022

pdf bib abs
Rethinking Data Augmentation in Text-to-text Paradigm
Yanan Chen | Yang Liu
Proceedings of the 29th International Conference on Computational Linguistics

As manually labelling data can be costly, some recent studies tend to augment the training data for improving the generalization power of machine learning models, known as data augmentation (DA). With the arise of pre-trained language models (PLMs), some recent works on DA try to synthesize new samples benefiting from the knowledge learned from PLM’s pre-training. Along the same direction, we in this paper propose to integrate text-to-text language models and construct a new two-phase framework for augmentation: 1) a fine-tuning phase where PLMs are well adapted to downstream classification with the help of two novel schemes, and 2) a generation phase where the fine-tuned models are leveraged to create new samples for performance lifting. This paradigm opens up a new way of designing fine-tuning scheme to better serve DA in an easy-to-implement manner, and can be easily extended to other desired tasks. We evaluate our proposal on two public classification datasets and demonstrate its effectiveness with remarkable gains.

Co-authors

Venues

emnlp2
coling1

Fix author