Sumit Singh

2025

pdf bib abs
silp_nlp at SemEval-2025 Task 2: An Effect of Entity Awareness in Machine Translation Using LLM
Sumit Singh | Pankaj Goyal | Uma Tiwary
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

In this study, we investigated the effect of entity awareness on machine translation (MT) using large language models (LLMs). Our approach utilized GPT-4o and NLLB-200, integrating named entity recognition (NER) to improve translation quality. The results indicated that incorporating entity information enhanced translation accuracy, especially when dealing with named entities. However, performance was highly dependent on the effectiveness of the NER model.

pdf bib abs
silp_nlp at SemEval-2025 Task 5: Subject Recommendation With Sentence Transformer
Sumit Singh | Pankaj Goyal | Uma Tiwary
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This work explored subject recommendation using sentence transformers within the SemEval-2025 Task 5 (LLMs4Subjects) challenge. Our approach leveraged embedding-based cosine similarity and hierarchical clustering to predict relevant GND subjects for TIB technical records in English and German. By experimenting with different models, including JinaAi, Distiluse-base-multilingual, and TF-IDF, we found that the JinaAi sentence transformer consistently outperformed other methods in terms of precision, recall, and F1-score.Our results highlight the effectiveness of transformer-based embeddings in semantic similarity tasks for subject classification. Additionally, hierarchical clustering helped reduce computational complexity by narrowing down candidate subjects efficiently. Despite the improvements, future work can focus on fine-tuning domain-specific embeddings, exploring knowledge graph integration, and enhancing multilingual capabilities for better generalization.

2024

pdf bib abs
silp_nlp at SemEval-2024 Task 1: Cross-lingual Knowledge Transfer for Mono-lingual Learning
Sumit Singh | Pankaj Goyal | Uma Tiwary
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Our team, silp_nlp, participated in all three tracks of SemEval2024 Task 1: Semantic Textual Relatedness (STR). We created systems for a total of 29 subtasks across all tracks: nine subtasks for track A, 10 subtasks for track B, and ten subtasks for track C. To make the most of our knowledge across all subtasks, we used transformer-based pre-trained models, which are known for their strong cross-lingual transferability. For track A, we trained our model in two stages. In the first stage, we focused on multi-lingual learning from all tracks. In the second stage, we fine-tuned the model for individual tracks. For track B, we used a unigram and bigram representation with suport vector regression (SVR) and eXtreme Gradient Boosting (XGBoost) regression. For track C, we again utilized cross-lingual transferability without the use of targeted subtask data. Our work highlights the fact that knowledge gained from all subtasks can be transferred to an individual subtask if the base language model has strong cross-lingual characteristics. Our system ranked first in the Indonesian subtask of Track B (C7) and in the top three for four other subtasks.

2023

pdf bib abs
Silp_nlp at SemEval-2023 Task 2: Cross-lingual Knowledge Transfer for Mono-lingual Learning
Sumit Singh | Uma Tiwary
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Our team silp_nlp participated in SemEval2023 Task 2: MultiCoNER II. Our work made systems for 11 mono-lingual tracks. For leveraging the advantage of all track knowledge we chose transformer-based pretrained models, which have strong cross-lingual transferability. Hence our model trained in two stages, the first stage for multi-lingual learning from all tracks and the second for fine-tuning individual tracks. Our work highlights that the knowledge of all tracks can be transferred to an individual track if the baseline language model has crosslingual features. Our system positioned itself in the top 10 for 4 tracks by scoring 0.7432 macro F1 score for the Hindi track ( 7th rank ) and 0.7322 macro F1 score for the Bangla track ( 9th rank ).

2022

pdf bib abs
silpa_nlp at SemEval-2022 Tasks 11: Transformer based NER models for Hindi and Bangla languages
Sumit Singh | Pawankumar Jawale | Uma Tiwary
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

We present Transformer based pretrained models, which are fine-tuned for Named Entity Recognition (NER) task. Our team participated in SemEval-2022 Task 11 MultiCoNER: Multilingual Complex Named Entity Recognition task for Hindi and Bangla. Result comparison of six models (mBERT, IndicBERT, MuRIL (Base), MuRIL (Large), XLM-RoBERTa (Base) and XLM-RoBERTa (Large) ) has been performed. It is found that among these models MuRIL (Large) model performs better for both the Hindi and Bangla languages. Its F1-Scores for Hindi and Bangla are 0.69 and 0.59 respectively.

Co-authors

Venues

semeval5
ws2

Fix author