Enyan Dai
2026
DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection
Yuliang Yan | Haochun Tang | Shuo Yan | Enyan Dai
Findings of the Association for Computational Linguistics: EACL 2026
Yuliang Yan | Haochun Tang | Shuo Yan | Enyan Dai
Findings of the Association for Computational Linguistics: EACL 2026
Large language models (LLMs) are considered valuable Intellectual Properties (IP) due to the enormous computational cost of training, making their protection against malicious stealing or unauthorized deployment crucial.Despite efforts in watermarking and fingerprinting, existing methods either affect text generation or rely on white-box access, limiting practicality.To address this, we propose DuFFin, a novel Dual-Level Fingerprinting framework for black-box ownership verification.DuFFin jointly extracts trigger patterns and knowledge-level fingerprints to identify the source of a suspect model.We conduct experiments on diverse open-source models, including four popular base LLMs and their fine-tuned, quantized, and safety-aligned variants released by large companies, start-ups, and individuals.Results show that DuFFin accurately verifies the copyright of protected LLMs on their variants, achieving an IP-ROC greater than 0.99.Our code is available at https://github.com/yuliangyan0807/llm-fingerprint.
2020
TEST_POSITIVE at W-NUT 2020 Shared Task-3: Cross-task modeling
Chacha Chen | Chieh-Yang Huang | Yaqi Hou | Yang Shi | Enyan Dai | Jiaqi Wang
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Chacha Chen | Chieh-Yang Huang | Yaqi Hou | Yang Shi | Enyan Dai | Jiaqi Wang
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
The competition of extracting COVID-19 events from Twitter is to develop systems that can automatically extract related events from tweets. The built system should identify different pre-defined slots for each event, in order to answer important questions (e.g., Who is tested positive? What is the age of the person? Where is he/she?). To tackle these challenges, we propose the Joint Event Multi-task Learning (JOELIN) model. Through a unified global learning framework, we make use of all the training data across different events to learn and fine-tune the language model. Moreover, we implement a type-aware post-processing procedure using named entity recognition (NER) to further filter the predictions. JOELIN outperforms the BERT baseline by 17.2% in micro F1.