Divyanshu Aggarwal


2022

pdf
XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference
Bhavnick Minhas | Anant Shankhdhar | Vivek Gupta | Divyanshu Aggarwal | Shuo Zhang
Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER)

The ability to reason about tabular or semi-structured knowledge is a fundamental problem for today’s Natural Language Processing (NLP) systems. While significant progress has been achieved in the direction of tabular reasoning, these advances are limited to English due to the absence of multilingual benchmark datasets for semi-structured data. In this paper, we use machine translation methods to construct a multilingual tabular NLI dataset, namely XINFOTABS, which expands the English tabular NLI dataset of INFOTABS to ten diverse languages. We also present several baselines for multilingual tabular reasoning, e.g., machine translation-based methods and cross-lingual. We discover that the XINFOTABS evaluation suite is both practical and challenging. As a result, this dataset will contribute to increased linguistic inclusion in tabular reasoning research and applications.

2021

pdf
Fine-tuning Distributional Semantic Models for Closely-Related Languages
Kushagra Bhatia | Divyanshu Aggarwal | Ashwini Vaidya
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

In this paper we compare the performance of three models: SGNS (skip-gram negative sampling) and augmented versions of SVD (singular value decomposition) and PPMI (Positive Pointwise Mutual Information) on a word similarity task. We particularly focus on the role of hyperparameter tuning for Hindi based on recommendations made in previous work (on English). Our results show that there are language specific preferences for these hyperparameters. We extend the best settings for Hindi to a set of related languages: Punjabi, Gujarati and Marathi with favourable results. We also find that a suitably tuned SVD model outperforms SGNS for most of our languages and is also more robust in a low-resource setting.