Pankaj Goyal


2025

pdf bib
silp_nlp at SemEval-2025 Task 2: An Effect of Entity Awareness in Machine Translation Using LLM
Sumit Singh | Pankaj Goyal | Uma Tiwary
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

In this study, we investigated the effect of entity awareness on machine translation (MT) using large language models (LLMs). Our approach utilized GPT-4o and NLLB-200, integrating named entity recognition (NER) to improve translation quality. The results indicated that incorporating entity information enhanced translation accuracy, especially when dealing with named entities. However, performance was highly dependent on the effectiveness of the NER model.

pdf bib
silp_nlp at SemEval-2025 Task 5: Subject Recommendation With Sentence Transformer
Sumit Singh | Pankaj Goyal | Uma Tiwary
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This work explored subject recommendation using sentence transformers within the SemEval-2025 Task 5 (LLMs4Subjects) challenge. Our approach leveraged embedding-based cosine similarity and hierarchical clustering to predict relevant GND subjects for TIB technical records in English and German. By experimenting with different models, including JinaAi, Distiluse-base-multilingual, and TF-IDF, we found that the JinaAi sentence transformer consistently outperformed other methods in terms of precision, recall, and F1-score.Our results highlight the effectiveness of transformer-based embeddings in semantic similarity tasks for subject classification. Additionally, hierarchical clustering helped reduce computational complexity by narrowing down candidate subjects efficiently. Despite the improvements, future work can focus on fine-tuning domain-specific embeddings, exploring knowledge graph integration, and enhancing multilingual capabilities for better generalization.

2024

pdf bib
silp_nlp at SemEval-2024 Task 1: Cross-lingual Knowledge Transfer for Mono-lingual Learning
Sumit Singh | Pankaj Goyal | Uma Tiwary
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Our team, silp_nlp, participated in all three tracks of SemEval2024 Task 1: Semantic Textual Relatedness (STR). We created systems for a total of 29 subtasks across all tracks: nine subtasks for track A, 10 subtasks for track B, and ten subtasks for track C. To make the most of our knowledge across all subtasks, we used transformer-based pre-trained models, which are known for their strong cross-lingual transferability. For track A, we trained our model in two stages. In the first stage, we focused on multi-lingual learning from all tracks. In the second stage, we fine-tuned the model for individual tracks. For track B, we used a unigram and bigram representation with suport vector regression (SVR) and eXtreme Gradient Boosting (XGBoost) regression. For track C, we again utilized cross-lingual transferability without the use of targeted subtask data. Our work highlights the fact that knowledge gained from all subtasks can be transferred to an individual subtask if the base language model has strong cross-lingual characteristics. Our system ranked first in the Indonesian subtask of Track B (C7) and in the top three for four other subtasks.