Partha Basuchowdhuri


2023

pdf
MLlab4CS at SemEval-2023 Task 2: Named Entity Recognition in Low-resource Language Bangla Using Multilingual Language Models
Shrimon Mukherjee | Madhusudan Ghosh | Girish | Partha Basuchowdhuri
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Extracting of NERs from low-resource languages and recognizing their types is one of the important tasks in the entity extraction domain. Recently many studies have been conducted in this area of research. In our study, we introduce a system for identifying complex entities and recognizing their types from low-resource language Bangla, which was published in SemEval Task 2 MulitCoNER II 2023. For this sequence labeling task, we use a pre-trained language model built on a natural language processing framework. Our team name in this competition is MLlab4CS. Our model Muril produces a macro average F-score of 76.27%, which is a comparable result for this competition.

2022

pdf
Astro-mT5: Entity Extraction from Astrophysics Literature using mT5 Language Model
Madhusudan Ghosh | Payel Santra | Sk Asif Iqbal | Partha Basuchowdhuri
Proceedings of the first Workshop on Information Extraction from Scientific Publications

Scientific research requires reading and extracting relevant information from existing scientific literature in an effective way. To gain insights over a collection of such scientific documents, extraction of entities and recognizing their types is considered to be one of the important tasks. Numerous studies have been conducted in this area of research. In our study, we introduce a framework for entity recognition and identification of NASA astrophysics dataset, which was published as a part of the DEAL SharedTask. We use a pre-trained multilingual model, based on a natural language processing framework for the given sequence labeling tasks. Experiments show that our model, Astro-mT5, out-performs the existing baseline in astrophysics related information extraction.