Dinesh Tewari


2025

pdf bib
SMOL: Professionally Translated Parallel Data for 115 Under-represented Languages
Isaac Caswell | Elizabeth Nielsen | Jiaming Luo | Colin Cherry | Geza Kovacs | Hadar Shemtov | Partha Talukdar | Dinesh Tewari | Moussa Doumbouya | Djibrila Diane | Baba Mamadi Diane | Solo Farabado | Edoardo Ferrante | Alessandro Guasoni | Mamadou Keita | Sudhamoy Debbarma | Ali Kuzhuget | David Anugraha | Muhammad Ravi Shulthan Habibi | Sina Ahmadi | Mingfei Liu | Jonathan Eng
Proceedings of the Tenth Conference on Machine Translation

We open-source SMOL(Set of Maximal Over-all Leverage), a suite of training data to un-lock machine translation for low-resource languages (LRLs). SMOL has been translated into123 under-resourced languages (125 language pairs), including many for which there exist no previous public resources, for a total of 6.1M translated tokens. SMOL comprises two sub-datasets, each carefully chosen for maximum impact given its size: SMOLSENT, a set of sentences chosen for broad unique token coverage, and SMOLDOC, a document-level source focusing on a broad topic coverage. They join the already released GATITOS for a trifecta of paragraph, sentence, and token-level content. We demonstrate that using SMOL to prompt or fine-tune Large Language Models yields robust chrF improvements. In addition to translation, we provide factuality ratings and rationales for all documents in SMOLDOC, yielding the first factuality datasets for most of these languages.

2024

pdf bib
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
Harman Singh | Nitish Gupta | Shikhar Bharadwaj | Dinesh Tewari | Partha Talukdar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench — the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. IndicGenBench is composed of diverse generation tasks like cross-lingual summarization, machine translation, and cross-lingual question answering. IndicGenBench extends existing benchmarks to many Indic languages through human curation providing multi-way parallel evaluation data for many under-represented Indic languages for the first time. We evaluate stateof-the-art LLMs like GPT-3.5, GPT-4, PaLM2, and LLaMA on IndicGenBench in a variety of settings. The largest PaLM-2 models performs the best on most tasks, however, there is a significant performance gap in all languages compared to English showing that further research is needed for the development of more inclusive multilingual language models. IndicGenBench isavailable at www.github.com/google-researchdatasets/indic-gen-bench

2023

pdf bib
Building Stereotype Repositories with Complementary Approaches for Scale and Depth
Sunipa Dev | Akshita Jha | Jaya Goyal | Dinesh Tewari | Shachi Dave | Vinodkumar Prabhakaran
Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

Measurements of fairness in NLP have been critiqued for lacking concrete definitions of biases or harms measured, and for perpetuating a singular, Western narrative of fairness globally. To combat some of these pivotal issues, methods for curating datasets and benchmarks that target specific harms are rapidly emerging. However, these methods still face the significant challenge of achieving coverage over global cultures and perspectives at scale. To address this, in this paper, we highlight the utility and importance of complementary approaches that leverage both community engagement as well as large generative models, in these curation strategies. We specifically target the harm of stereotyping and demonstrate a pathway to build a benchmark that covers stereotypes about diverse, and intersectional identities. We discuss the two approaches, their advantages and constraints, the characteristics of the data they produce, and finally, their potential to be used complementarily for better evaluation of stereotyping harms.