Audi Primadhanty

2024

pdf bib abs
Does Fine-tuning a Classifier Help in Low-budget Scenarios? Not Much
Cesar Gonzalez - Gutierrez | Audi Primadhanty | Francesco Cazzaro | Ariadna Quattoni
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP

In recent years, the two-step approach for text classification based on pre-training plus fine-tuning has led to significant improvements in classification performance. In this paper, we study the low-budget scenario, and we ask whether it is justified to allocate the additional resources needed for fine-tuning complex models. To do so, we isolate the gains obtained from pre-training from those obtained from fine-tuning. We find out that, when the gains from pre-training are factored out, the performance attained by using complex transformer models leads to marginal improvements over simpler models. Therefore, in this scenario, utilizing simpler classifiers on top of pre-trained representations proves to be a viable alternative.

2023

pdf bib abs
Analyzing Text Representations by Measuring Task Alignment
Cesar Gonzalez-Gutierrez | Audi Primadhanty | Francesco Cazzaro | Ariadna Quattoni
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Textual representations based on pre-trained language models are key, especially in few-shot learning scenarios. What makes a representation good for text classification? Is it due to the geometric properties of the space or because it is well aligned with the task? We hypothesize the second claim. To test it, we develop a task alignment score based on hierarchical clustering that measures alignment at different levels of granularity. Our experiments on text classification validate our hypothesis by showing that task alignment can explain the classification performance of a given representation.

pdf bib abs
Entity Disambiguation on a Tight Labeling Budget
Audi Primadhanty | Ariadna Quattoni
Findings of the Association for Computational Linguistics: EMNLP 2023

Many real-world NLP applications face the challenge of training an entity disambiguation model for a specific domain with a small labeling budget. In this setting there is often access to a large unlabeled pool of documents. It is then natural to ask the question: which samples should be selected for annotation? In this paper we propose a solution that combines feature diversity with low rank correction. Our sampling strategy is formulated in the context of bilinear tensor models. Our experiments show that the proposed approach can significantly reduce the amount of labeled data necessary to achieve a given performance.

2017

pdf bib abs
InToEventS: An Interactive Toolkit for Discovering and Building Event Schemas
Germán Ferrero | Audi Primadhanty | Ariadna Quattoni
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Event Schema Induction is the task of learning a representation of events (e.g., bombing) and the roles involved in them (e.g, victim and perpetrator). This paper presents InToEventS, an interactive tool for learning these schemas. InToEventS allows users to explore a corpus and discover which kind of events are present. We show how users can create useful event schemas using two interactive clustering steps.