Manav Kapadnis


2023

pdf
CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text
Abhilash Nandy | Manav Kapadnis | Pawan Goyal | Niloy Ganguly
Findings of the Association for Computational Linguistics: EMNLP 2023

In this paper, we propose ***CLMSM***, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes. ***CLMSM*** uses a Multi-Task Learning Framework to optimize two objectives - a) Contrastive Learning using hard triplets to learn fine-grained differences across entities in the procedures, and b) a novel Mask-Step Modelling objective to learn step-wise context of a procedure. We test the performance of ***CLMSM*** on the downstream tasks of tracking entities and aligning actions between two procedures on three datasets, one of which is an open-domain dataset not conforming with the pre-training dataset. We show that ***CLMSM*** not only outperforms baselines on recipes (in-domain) but is also able to generalize to open-domain procedural NLP tasks.

2022

pdf
An Evaluation Framework for Legal Document Summarization
Ankan Mullick | Abhilash Nandy | Manav Kapadnis | Sohan Patnaik | Raghav R | Roshni Kar
Proceedings of the Thirteenth Language Resources and Evaluation Conference

A law practitioner has to go through numerous lengthy legal case proceedings for their practices of various categories, such as land dispute, corruption, etc. Hence, it is important to summarize these documents, and ensure that summaries contain phrases with intent matching the category of the case. To the best of our knowledge, there is no evaluation metric that evaluates a summary based on its intent. We propose an automated intent-based summarization metric, which shows a better agreement with human evaluation as compared to other automated metrics like BLEU, ROUGE-L etc. in terms of human satisfaction. We also curate a dataset by annotating intent phrases in legal documents, and show a proof of concept as to how this system can be automated.

pdf
Enolp musk@SMM4H’22 : Leveraging Pre-trained Language Models for Stance And Premise Classification
Millon Das | Archit Mangrulkar | Ishan Manchanda | Manav Kapadnis | Sohan Patnaik
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper covers our approaches for the Social Media Mining for Health (SMM4H) Shared Tasks 2a and 2b. Apart from the baseline architectures, we experiment with Parts of Speech (PoS), dependency parsing, and Tf-Idf features. Additionally, we perform contrastive pretraining on our best models using a supervised contrastive loss function. In both the tasks, we outperformed the mean and median scores and ranked first on the validation set. For stance classification, we achieved an F1-score of 0.636 using the CovidTwitterBERT model, while for premise classification, we achieved an F1-score of 0.664 using BART-base model on test dataset.

2021

pdf
Team Enigma at ArgMining-EMNLP 2021: Leveraging Pre-trained Language Models for Key Point Matching
Manav Kapadnis | Sohan Patnaik | Siba Panigrahi | Varun Madhavan | Abhilash Nandy
Proceedings of the 8th Workshop on Argument Mining

We present the system description for our submission towards the Key Point Analysis Shared Task at ArgMining 2021. Track 1 of the shared task requires participants to develop methods to predict the match score between each pair of arguments and key points, provided they belong to the same topic under the same stance. We leveraged existing state of the art pre-trained language models along with incorporating additional data and features extracted from the inputs (topics, key points, and arguments) to improve performance. We were able to achieve mAP strict and mAP relaxed score of 0.872 and 0.966 respectively in the evaluation phase, securing 5th place on the leaderboard. In the post evaluation phase, we achieved a mAP strict and mAP relaxed score of 0.921 and 0.982 respectively.