Daniil Ignatev
2025
Disentangling the Roles of Representation and Selection in Data Pruning
Yupei Du
|
Yingjin Song
|
Hugh Mee Wong
|
Daniil Ignatev
|
Albert Gatt
|
Dong Nguyen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Data pruning—selecting small but impactful subsets—offers a promising way to efficiently scale NLP model training. However, existing methods often involve many different design choices, which have not been systematically studied. This limits future developments. In this work, we decompose data pruning into two key components: data representation and selection algorithm, and systematically analyze their influence on selected instances. Our theoretical and empirical results highlight the crucial role of representations: better representations, e.g., training gradients, generally lead to better selected instances, regardless of the chosen selection algorithm. Furthermore, different selection algorithms excel in different settings, and none consistently outperform the others. Moreover, the selection algorithms do not always align with their intended objectives: for example, algorithms designed for the same objective can select drastically different instances, highlighting the need for careful evaluation.
Hypernetworks for Perspectivist Adaptation
Daniil Ignatev
|
Denis Paperno
|
Massimo Poesio
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
The task of perspective-aware classification introduces a bottleneck in terms of parametric efficiency that did not get enough recognition in existing studies. In this article, we aim to address this issue by applying an existing architecture, the hypernetwork+adapters combination, to perspectivist classification. Ultimately, we arrive at a solution that can compete with specialized models in adopting user perspectives on hate speech and toxicity detection, while also making use of considerably fewer parameters. Our solution is architecture-agnostic and can be applied to a wide range of base models out of the box.
DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
Daniil Ignatev
|
Nan Li
|
Hugh Mee Wong
|
Anh Dang
|
Shane Kaszefski Yaschuk
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
This system paper presents the DeMeVa team’s approaches to the third edition of the Learning with Disagreements shared task (LeWiDi 2025; Leonardelli et al., 2025). We explore two directions: in-context learning (ICL) with large language models, where we compare example sampling strategies; and label distribution learning (LDL) methods with RoBERTa (Liu et al., 2019b), where we evaluate several fine-tuning methods. Our contributions are twofold: (1) we show that ICL can effectively predict annotator-specific annotations (perspectivist annotations), and that aggregating these predictions into soft labels yields competitive performance; and (2) we argue that LDL methods are promising for soft label predictions and merit further exploration by the perspectivist community.
Search
Fix author
Co-authors
- Hugh Mee Wong 2
- Anh Dang 1
- Yupei Du 1
- Albert Gatt 1
- Shane Kaszefski-Yaschuk 1
- show all...