Sunil Mallya


2023

pdf
Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents
Muhammad Khalifa | Yogarshi Vyas | Shuai Wang | Graham Horwood | Sunil Mallya | Miguel Ballesteros
Findings of the Association for Computational Linguistics: ACL 2023

We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new classification categories could potentially emerge. We focus exclusively on the zero-shot learning setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F1 from the proposed pretraining step and comparable performance of the contrastive fine-tuning to a standard prediction objective in both supervised and unsupervised zero-shot settings.

2022

pdf
Label Semantics for Few Shot Named Entity Recognition
Jie Ma | Miguel Ballesteros | Srikanth Doss | Rishita Anubhai | Sunil Mallya | Yaser Al-Onaizan | Dan Roth
Findings of the Association for Computational Linguistics: ACL 2022

We study the problem of few shot learning for named entity recognition. Specifically, we leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. We propose a neural architecture that consists of two BERT encoders, one to encode the document and its tokens and another one to encode each of the labels in natural language format. Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder. The label semantics signal is shown to support improved state-of-the-art results in multiple few shot NER benchmarks and on-par performance in standard benchmarks. Our model is especially effective in low resource settings.