2025
pdf
bib
abs
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation
Mingxuan Xia
|
Haobo Wang
|
Yixuan Li
|
Zewei Yu
|
Jindong Wang
|
Junbo Zhao
|
Runze Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recently, Large Language Models (LLMs) have demonstrated significant potential for data annotation, markedly reducing the labor costs associated with downstream applications. However, existing methods mostly adopt an aggressive strategy by prompting LLM to determine a single gold label for each unlabeled sample. Due to the inherent uncertainty within LLMs, they often produce incorrect labels for difficult samples, severely compromising the data quality for downstream applications. Motivated by ambiguity aversion in human behaviors, we propose a novel candidate annotation paradigm wherein large language models are encouraged to output all possible labels when incurring uncertainty. To ensure unique labels are provided for downstream tasks, we develop a teacher-student framework CanDist that distills candidate annotations with a Small Language Model (SLM). We further provide a rigorous justification demonstrating that distilling candidate annotations from the teacher LLM offers superior theoretical guarantees compared to directly using single annotations. Extensive experiments across six text classification tasks validate the effectiveness of our proposed method. The source code is available at https://github.com/MingxuanXia/CanDist.
pdf
bib
abs
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
Atsuyuki Miyai
|
Jingkang Yang
|
Jingyang Zhang
|
Yifei Ming
|
Qing Yu
|
Go Irie
|
Yixuan Li
|
Hai Helen Li
|
Ziwei Liu
|
Kiyoharu Aizawa
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper introduces a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs), termed Unsolvable Problem Detection (UPD). Multiple-choice question answering (MCQA) is widely used to assess the understanding capability of LMMs, but it does not guarantee that LMMs truly comprehend the answer. UPD assesses the LMM’s ability to withhold answers when encountering unsolvable problems of MCQA, verifying whether the model truly understands the answer. UPD encompasses three problems: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD), covering unsolvable cases like answer-lacking or incompatible choices and image-question mismatches. For the evaluation, we introduce the MM-UPD Bench, a benchmark for assessing performance across various ability dimensions. Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD, underscoring a novel aspect of trustworthiness that current benchmarks have overlooked. A detailed analysis shows that LMMs have different bottlenecks and chain-of-thought and self-reflection improved performance for LMMs with the bottleneck in their LLM capability. We hope our insights will enhance the broader understanding and development of more reliable LMMs.
pdf
bib
abs
HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation
Weizhi Tang
|
Yixuan Li
|
Chris Sypherd
|
Elizabeth Polgreen
|
Vaishak Belle
Findings of the Association for Computational Linguistics: ACL 2025
Grammar plays a critical role in natural language processing and text/code generation by enabling the definition of syntax, the creation of parsers, and guiding structured outputs. Although large language models (LLMs) demonstrate impressive capabilities across domains, their ability to infer and generate grammars has not yet been thoroughly explored. In this paper, we aim to study and improve the ability of LLMs for few-shot grammar generation, where grammars are inferred from sets of a small number of positive and negative examples and generated in Backus-Naur Form. To explore this, we introduced a novel dataset comprising 540 structured grammar generation challenges, devised 6 metrics, and evaluated 8 various LLMs against it. Our findings reveal that existing LLMs perform sub-optimally in grammar generation. To address this, we propose an LLM-driven hybrid genetic algorithm, namely HyGenar, to optimize grammar generation. HyGenar achieves substantial improvements in both the syntactic and semantic correctness of generated grammars across LLMs.
2024
pdf
bib
abs
Joint Annotation of Morphology and Syntax in Dependency Treebanks
Bruno Guillaume
|
Kim Gerdes
|
Kirian Guiller
|
Sylvain Kahane
|
Yixuan Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In this paper, we compare different ways to annotate both syntactic and morphological relations in a dependency treebank and we propose new formats we call mSUD and mUD, compatible with the Universal Dependencies (UD) schema for syntactic treebanks. We emphasize mSUD rather than mUD, the former being based on distributional criteria for the choice of the head of any combination, which allow us to clearly encode the internal structure of a word, that is, the derivational path. We investigate different problems posed by a morph-based annotation, concerning tokenization, choice of the head of a morph combination, relations between morphs, additional features needed, such as the token type differentiating roots and derivational and inflectional affixes. We show how our annotation schema can be applied to different languages from polysynthetic languages such as Yupik to isolating languages such as Chinese.
pdf
bib
abs
How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?
Rheeya Uppaal
|
Yixuan Li
|
Junjie Hu
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)
Recent breakthroughs in scale have enabled the emergence of powerful generative language models, and the ability to fine-tune these models on various tasks by casting them into prompts or instructions. In this landscape, the problem of Unsupervised Domain Adaptation (UDA), or the problem of leveraging knowledge from a labeled source domain to an unlabeled target domain, has been left behind, with recent UDA methods still addressing discriminative classification. In particular, two popular UDA approaches, involving Continued Pre-Training (CPT) and learning domain invariant representations, have been under-explored in the generative setting, signaling a gap. In this work, we evaluate the utility of CPT for generative UDA. We first perform an empirical evaluation to measure the trade-offs between CPT and strong methods promoting domain invariance. We further evaluate how well the benefits of CPT extend to different architectures, tuning methods and data regimes. We then motivate the use of CPT by studying to what degree it benefits classification performance on the target domain. Finally, we attempt to understand the mechanism behind which CPT improves classification performance on the unlabeled target domain. Our findings suggest that a implicitly learns the downstream task while predicting masked words informative to that task. Our work connects the body of UDA research with that of instruction tuning, enabling an initial step towards a wider applicability of modern language models.
2023
pdf
bib
abs
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection
Rheeya Uppaal
|
Junjie Hu
|
Yixuan Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Out-of-distribution (OOD) detection is a critical task for reliable predictions over text. Fine-tuning with pre-trained language models has been a de facto procedure to derive OOD detectors with respect to in-distribution (ID) data. Despite its common use, the understanding of the role of fine-tuning and its necessity for OOD detection is largely unexplored. In this paper, we raise the question: is fine-tuning necessary for OOD detection? We present a study investigating the efficacy of directly leveraging pre-trained language models for OOD detection, without any model fine-tuning on the ID data. We compare the approach with several competitive fine-tuning objectives, and offer new insights under various types of distributional shifts. Extensive experiments demonstrate near-perfect OOD detection performance (with 0% FPR95 in many cases), strongly outperforming the fine-tuned counterpart.
pdf
bib
abs
A Critical Analysis of Document Out-of-Distribution Detection
Jiuxiang Gu
|
Yifei Ming
|
Yi Zhou
|
Jason Kuen
|
Vlad Morariu
|
Handong Zhao
|
Ruiyi Zhang
|
Nikolaos Barmpalios
|
Anqi Liu
|
Yixuan Li
|
Tong Sun
|
Ani Nenkova
Findings of the Association for Computational Linguistics: EMNLP 2023
Large-scale pre-training is widely used in recent document understanding tasks. During deployment, one may expect that models should trigger a conservative fallback policy when encountering out-of-distribution (OOD) samples, which highlights the importance of OOD detection. However, most existing OOD detection methods focus on single-modal inputs such as images or texts. While documents are multi-modal in nature, it is underexplored if and how multi-modal information in documents can be exploited for OOD detection. In this work, we first provide a systematic and in-depth analysis on OOD detection for document understanding models. We study the effects of model modality, pre-training, and fine-tuning across various types of OOD inputs. In particular, we find that spatial information is critical for document OOD detection. To better exploit spatial information, we propose a spatial-aware adapter, which serves as a parameter-efficient add-on module to adapt transformer-based language models to the document domain. Extensive experiments show that adding the spatial-aware adapter significantly improves the OOD detection performance compared to directly using the language model and achieves superior performance compared to competitive baselines.
2019
pdf
bib
Character-level Annotation for Chinese Surface-Syntactic Universal Dependencies
Yixuan Li
|
Gerdes Kim
|
Dong Chuanming
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)