2025
pdf
bib
abs
Quality Estimation and Post-Editing Using LLMs For Indic Languages: How Good Is It?
Anushka Singh
|
Aarya Pakhale
|
Mitesh M. Khapra
|
Raj Dabre
Proceedings of Machine Translation Summit XX: Volume 1
Recently, there have been increasing efforts on Quality Estimation (QE) and Post-Editing (PE) using Large Language Models (LLMs) for Machine Translation (MT). However, the focus has mainly been on high resource languages and the approaches either rely on prompting or combining existing QE models with LLMs, instead of single end-to-end systems. In this paper, we investigate the efficacy of end-to-end QE and PE systems for low-resource languages taking 5 Indian languages as a use-case. We augment existing QE data containing multidimentional quality metric (MQM) error annotations with explanations of errors and PEs with the help of proprietary LLMs (GPT-4), following which we fine-tune Gemma-2-9B, an open-source multilingual LLM to perform QE and PE jointly. While our models attain QE capabilities competitive with or surpassing existing models in both referenceful and referenceless settings, we observe that they still struggle with PE. Further investigation reveals that this occurs because our models lack the ability to accurately identify fine-grained errors in the translation, despite being excellent indicators of overall quality. This opens up opportunities for research in end-to-end QE and PE for low-resource languages.
2024
pdf
bib
abs
How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?
Anushka Singh
|
Ananya Sai
|
Raj Dabre
|
Ratish Puduppully
|
Anoop Kunchukuttan
|
Mitesh Khapra
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
While machine translation evaluation has been studied primarily for high-resource languages, there has been a recent interest in evaluation for low-resource languages due to the increasing availability of data and models. In this paper, we focus on a zero-shot evaluation setting focusing on low-resource Indian languages, namely Assamese, Kannada, Maithili, and Punjabi. We collect sufficient Multi-Dimensional Quality Metrics (MQM) and Direct Assessment (DA) annotations to create test sets and meta-evaluate a plethora of automatic evaluation metrics. We observe that even for learned metrics, which are known to exhibit zero-shot performance, the Kendall Tau and Pearson correlations with human annotations are only as high as 0.32 and 0.45. Synthetic data approaches show mixed results and overall do not help close the gap by much for these languages. This indicates that there is still a long way to go for low-resource evaluation.
pdf
bib
abs
Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis
Agam Shah
|
Arnav Hiray
|
Pratvi Shah
|
Arkaprabha Banerjee
|
Anushka Singh
|
Dheeraj Deepak Eidnani
|
Sahasra Chava
|
Bhaskar Chaudhury
|
Sudheer Chava
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)
In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. We also demonstrate the practical utility of our proposed model by constructing a novel measure of *optimism*. Here, we observe the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code are publicly (under CC BY 4.0 license) available on GitHub.