Soundararajan Srinivasan
2023
On Surgical Fine-tuning for Language Encoders
Abhilasha Lodha
|
Gayatri Belapurkar
|
Saloni Chalkapurkar
|
Yuanming Tao
|
Reshmi Ghosh
|
Samyadeep Basu
|
Dmitrii Petrov
|
Soundararajan Srinivasan
Findings of the Association for Computational Linguistics: EMNLP 2023
Fine-tuning all the layers of a pre-trained neural language encoder (either using all the parameters or using parameter-efficient methods) is often the de-facto way of adapting it to a new task. We show evidence that for different downstream language tasks, fine-tuning only a subset of layers is sufficient to obtain performance that is close to and often better than fine-tuning all the layers in the language encoder. We propose an efficient metric based on the diagonal of the Fisher information matrix (FIM score), to select the candidate layers for selective fine-tuning. We show, empirically on GLUE and SuperGLUE tasks and across distinct language encoders, that this metric can effectively select layers leading to a strong downstream performance. Our work highlights that task-specific information corresponding to a given downstream task is often localized within a few layers, and tuning only those is sufficient for strong performance. Additionally, we demonstrate the robustness of the FIM score to rank layers in a manner that remains constant during the optimization process.
2022
SLATE: A Sequence Labeling Approach for Task Extraction from Free-form Inked Content
Apurva Gandhi
|
Ryan Serrao
|
Biyi Fang
|
Gilbert Antonius
|
Jenna Hong
|
Tra My Nguyen
|
Sheng Yi
|
Ehi Nosakhare
|
Irene Shaffer
|
Soundararajan Srinivasan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or “inked”) notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence segmentation followed by classification model) approach, achieving a task F1 score of 84.4%, a sentence segmentation (boundary similarity) score of 88.4% and three times lower latency compared to the baseline. Furthermore, we provide insights into tackling challenges of performing NLP on the inking domain. We release both our code and dataset for this novel task.
Search
Co-authors
- Abhilasha Lodha 1
- Gayatri Belapurkar 1
- Saloni Chalkapurkar 1
- Yuanming Tao 1
- Reshmi Ghosh 1
- show all...