2024
pdf
abs
FOFO: A Benchmark to Evaluate LLMs’ Format-Following Capability
Congying Xia
|
Chen Xing
|
Jiangshu Du
|
Xinyi Yang
|
Yihao Feng
|
Ran Xu
|
Wenpeng Yin
|
Caiming Xiong
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper presents FoFo, a pioneering benchmark for evaluating large language models’ (LLMs) ability to follow complex, domain-specific formats, a crucial yet under-examined capability for their application as AI agents. Despite LLMs’ advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and instructions, developed through an AI-Human collaborative method. Our evaluation across both open-source (e.g., Llama 2, WizardLM) and closed-source (e.g., GPT-4, PALM2, Gemini) LLMs highlights three key findings: open-source models significantly lag behind closed-source ones in format adherence; LLMs’ format-following performance is independent of their content generation quality; and LLMs’ format proficiency varies across different domains. These insights suggest the need for specialized tuning for format-following skills and highlight FoFo’s role in guiding the selection of domain-specific AI agents. FoFo will be publicly released, contributing a critical tool for advancing LLM evaluation and application.
pdf
abs
kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning
Wenting Zhao
|
Ye Liu
|
Yao Wan
|
Yibo Wang
|
Qingyang Wu
|
Zhongfen Deng
|
Jiangshu Du
|
Shuaiqi Liu
|
Yunlong Xu
|
Philip Yu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Task-Oriented Parsing (TOP) enables conversational assistants to interpret user commands expressed in natural language, transforming them into structured outputs that combine elements of both natural language and intent/slot tags. Recently, Large Language Models (LLMs) have achieved impressive performance in synthesizing computer programs based on a natural-language prompt, mitigating the gap between natural language and structured programs. Our paper focuses on harnessing the capabilities of LLMs for semantic parsing tasks, addressing the following three key research questions: 1) How can LLMs be effectively utilized for semantic parsing tasks? 2) What defines an effective prompt? and 3) How can LLM overcome the length constraint and streamline prompt design by including all examples as prompts? We introduce k Nearest Neighbor In-Context Learning (kNN-ICL), which simplifies prompt engineering by allowing it to be built on top of any design strategy while providing access to all demo examples. Extensive experiments show that: 1) Simple ICL without kNN search can achieve a comparable performance with strong supervised models on the TOP tasks, and 2) kNN-ICL significantly improves the comprehension of complex requests by seamlessly integrating ICL with a nearest-neighbor approach. Notably, this enhancement is achieved without the need for additional data or specialized prompts.
2023
pdf
All Labels Together: Low-shot Intent Detection with an Efficient Label Semantic Encoding Paradigm
Jiangshu Du
|
Congying Xia
|
Wenpeng Yin
|
Tingting Liang
|
Philip Yu
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)