Translating the sentences between English and Hindi is challenging, especially in the domain of legal documents. The major reason behind the complexity is specialized legal terminology, long and complex sentences, and the accuracy constraint. This paper presents a system developed by Team-SVNIT for the JUST-NLP 2025 shared task on legal machine translation. We fine-tune and compare multiple pretrained multilingual translation models, including the facebook/nllb-200-distilled-1.3B, on a corpus of 50,000 English–Hindi legal sentence pairs provided for the shared task. The training pipeline includes preprocessing, context windows of 512 tokens, and decoding methods to enhance translation quality. The proposed method secured 1st place on the official leaderboard with the AutoRank score of 61.62. We obtained the following scores on various metrics: BLEU 51.61, METEOR 75.80, TER 37.09, CHRF++ 73.29, BERTScore 92.61, and COMET 76.36. These results demonstrate that fine-tuning multilingual models for a domain-specific machine translation task enhances performance. It works better than general multilingual translation systems.
Annotating entity mentions and linking them to a knowledge resource are essential tasks in many domains. It disambiguates mentions, introduces cross-document coreferences, and the resources contribute extra information, e.g. taxonomic relations. Such tasks benefit from text annotation tools that integrate a search which covers the text, the annotations, as well as the knowledge resource. However, to the best of our knowledge, no current tools integrate knowledge-supported search as well as entity linking support. We address this gap by introducing knowledge-supported search functionality into the INCEpTION text annotation platform. In our approach, cross-document references are created by linking entity mentions to a knowledge base in the form of a structured hierarchical vocabulary. The resulting annotations are then indexed to enable fast and yet complex queries taking into account the text, the annotations, and the vocabulary structure.
This paper describes the TWINA system, with which we participated in SemEval-2017 Task 4B (Topic Based Message Polarity Classification – Two point scale) and 4D (two-point scale Tweet quantification). We implemented ensemble based Gradient Boost Trees classification method for both the tasks. Our system could perform well for the task 4D and ranked 13th among 15 teams, for the task 4B our model ranked 23rd position.