2024
pdf
abs
More Samples or More Prompts? Exploring Effective Few-Shot In-Context Learning for LLMs with In-Context Sampling
Bingsheng Yao
|
Guiming Chen
|
Ruishi Zou
|
Yuxuan Lu
|
Jiachen Li
|
Shao Zhang
|
Yisi Sang
|
Sijia Liu
|
James Hendler
|
Dakuo Wang
Findings of the Association for Computational Linguistics: NAACL 2024
While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM’s performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs’ performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM’s performance, which sheds light on a new yet promising future research direction.
2023
pdf
abs
Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture
Bingsheng Yao
|
Ishan Jindal
|
Lucian Popa
|
Yannis Katsis
|
Sayan Ghosh
|
Lihong He
|
Yuxuan Lu
|
Shashank Srivastava
|
Yunyao Li
|
James Hendler
|
Dakuo Wang
Findings of the Association for Computational Linguistics: EMNLP 2023
Real-world domain experts (e.g., doctors) rarely annotate only a decision label in their day-to-day workflow without providing explanations. Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to support experts’ real-world need for label and explanation annotations in low-resource scenarios. Our AL architecture leverages an explanation-generation model to produce explanations guided by human explanations, a prediction model that utilizes generated explanations toward prediction faithfully, and a novel data diversity-based AL sampling strategy that benefits from the explanation annotations. Automated and human evaluations demonstrate the effectiveness of incorporating explanations into AL sampling and the improved human annotation efficiency and trustworthiness with our AL architecture. Additional ablation studies illustrate the potential of our AL architecture for transfer learning, generalizability, and integration with large language models (LLMs). While LLMs exhibit exceptional explanation-generation capabilities for relatively simple tasks, their effectiveness in complex real-world tasks warrants further in-depth study.
pdf
abs
Are Human Explanations Always Helpful? Towards Objective Evaluation of Human Natural Language Explanations
Bingsheng Yao
|
Prithviraj Sen
|
Lucian Popa
|
James Hendler
|
Dakuo Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be quite subjective. Before blindly using them as ground truth to train ML models, a vital question needs to be asked: How do we evaluate a human-annotated explanation’s quality? In this paper, we build on the view that the quality of a human-annotated explanation can be measured based on its helpfulness (or impairment) to the ML models’ performance for the desired NLP tasks for which the annotations were collected. In comparison to the commonly used Simulatability score, we define a new metric that can take into consideration the helpfulness of an explanation for model performance at both fine-tuning and inference. With the help of a unified dataset format, we evaluated the proposed metric on five datasets (e.g., e-SNLI) against two model architectures (T5 and BART), and the results show that our proposed metric can objectively evaluate the quality of human-annotated explanations, while Simulatability falls short.
2016
pdf
Cross-media Event Extraction and Recommendation
Di Lu
|
Clare Voss
|
Fangbo Tao
|
Xiang Ren
|
Rachel Guan
|
Rostyslav Korolov
|
Tongtao Zhang
|
Dongang Wang
|
Hongzhi Li
|
Taylor Cassidy
|
Heng Ji
|
Shih-fu Chang
|
Jiawei Han
|
William Wallace
|
James Hendler
|
Mei Si
|
Lance Kaplan
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
1982
pdf
A Message-Passing Control Structure for Text Understanding
Brian Phillips
|
James A. Hendler
Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics
1980
pdf
The Impatient Tutor: An Integrated Language Understanding System
Brian Phillips
|
James Hendler
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics