Sumit Kumar


2025

pdf bib
AdaBioBERT: Adaptive Token Sequence Learning for Biomedical Named Entity Recognition
Sumit Kumar | Tanmay Basu
Proceedings of the 24th Workshop on Biomedical Language Processing

Accurate identification and labeling of biomedical entities, such as diseases, genes, chemical and species, within scientific texts are crucial for understanding complex relationships. We propose Adaptive BERT or AdaBioBERT, a robust named entity recognition (NER) model that builds upon BioBERT (Biomedical Bidirectional Encoded Representation from Transformers) based on an adaptive loss function to learn different types of biomedical token sequence. This adaptive loss function combines the standard Cross Entropy (CE) loss and Conditional Random Field (CRF) loss to optimize both token level accuracy and sequence-level coherence. AdaBioBERT captures rich semantic nuances by leveraging pre-trained contextual embeddings from BioBERT. On the other hand, the CRF loss of AdaBioBERT ensures proper identification of complex multi-token biomedical entities in a sequence and the CE loss can capture the simple unigram entities in a sequence. The empirical analysis on multiple standard biomedical coprora demonstrates that AdaBioBERT performs better than the state of the arts for most of the datasets in terms of macro and micro averaged F1 score.’

2020

pdf bib
CLPLM: Character Level Pretrained Language Model for ExtractingSupport Phrases for Sentiment Labels
Raj Pranesh | Sumit Kumar | Ambesh Shekhar
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

In this paper, we have designed a character-level pre-trained language model for extracting support phrases from tweets based on the sentiment label. We also propose a character-level ensemble model designed by properly blending Pre-trained Contextual Embeddings (PCE) models- RoBERTa, BERT, and ALBERT along with Neural network models- RNN, CNN and WaveNet at different stages of the model. For a given tweet and associated sentiment label, our model predicts the span of phrases in a tweet that prompts the particular sentiment in the tweet. In our experiments, we have explored various model architectures and configuration for both single as well as ensemble models. We performed a systematic comparative analysis of all the model’s performance based on the Jaccard score obtained. The best performing ensemble model obtained the highest Jaccard scores of 73.5, giving it a relative improvement of 2.4% over the best performing single RoBERTa based character-level model, at 71.5(Jaccard score).

2004

pdf bib
Anaphora Resolution in Multi-Person Dialogues
Prateek Jain | Manav Ratan Mital | Sumit Kumar | Amitabha Mukerjee | Achla M. Raina
Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004