Arun Balaji Buduru


FAtNet: Cost-Effective Approach Towards Mitigating the Linguistic Bias in Speaker Verification Systems
Divya Sharma | Arun Balaji Buduru
Findings of the Association for Computational Linguistics: NAACL 2022

Linguistic bias in Deep Neural Network (DNN) based Natural Language Processing (NLP) systems is a critical problem that needs attention. The problem further intensifies in the case of security systems, such as speaker verification, where fairness is essential. Speaker verification systems are intelligent systems that determine if two speech recordings belong to the same speaker. Such human-oriented security systems should be usable by diverse people speaking varied languages. Thus, a speaker verification system trained on speech in one language should generalize when tested for other languages. However, DNN-based models are often language-dependent. Previous works explore domain adaptation to fine-tune the pre-trained model for out-of-domain languages. Fine-tuning the model individually for each existing language is expensive. Hence, it limits the usability of the system. This paper proposes the cost-effective idea of integrating a lightweight embedding with existing speaker verification systems to mitigate linguistic bias without adaptation. This work is motivated by the theoretical hypothesis that attentive-frames could help generate language-agnostic embeddings. For scientific validation of this hypothesis, we propose two frame-attentive networks and investigate the effect of their integration with baselines for twelve languages. Empirical results suggest that frame-attentive embedding can cost-effectively reduce linguistic bias and enhance the usability of baselines.

CamPros at CASE 2022 Task 1: Transformer-based Multilingual Protest News Detection
Neha Kumari | Mrinal Anand | Tushar Mohan | Ponnurangam Kumaraguru | Arun Balaji Buduru
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

Socio-political protests often lead to grave consequences when they occur. The early detection of such protests is very important for taking early precautionary measures. However, the main shortcoming of protest event detection is the scarcity of sufficient training data for specific language categories, which makes it difficult to train data-hungry deep learning models effectively. Therefore, cross-lingual and zero-shot learning models are needed to detect events in various low-resource languages. This paper proposes a multi-lingual cross-document level event detection approach using pre-trained transformer models developed for Shared Task 1 at CASE 2022. The shared task constituted four subtasks for event detection at different granularity levels, i.e., document level to token level, spread over multiple languages (English, Spanish, Portuguese, Turkish, Urdu, and Mandarin). Our system achieves an average F1 score of 0.73 for document-level event detection tasks. Our approach secured 2nd position for the Hindi language in subtask 1 with an F1 score of 0.80. While for Spanish, we secure 4th position with an F1 score of 0.69. Our code is available at


An Exploratory Study on Temporally Evolving Discussion around Covid-19 using Diachronic Word Embeddings
Avinash Tulasi | Asanobu Kitamoto | Ponnurangam Kumaraguru | Arun Balaji Buduru
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

Covid 19 has seen the world go into a lock down and unconventional social situations throughout. During this time, the world saw a surge in information sharing around the pandemic and the topics shared in the time were diverse. People’s sentiments have changed during this period. Given the wide spread usage of Online Social Networks (OSN) and support groups, the user sentiment is well reflected in online discussions. In this work, we aim to show the topics under discussion, evolution of discussions, change in user sentiment during the pandemic. Alongside which, we also demonstrate the possibility of exploratory analysis to find pressing topics, change in perception towards the topics and ways to use the knowledge extracted from online discussions. For our work we employ Diachronic Word embeddings which capture the change in word usage over time. With the help of analysis from temporal word usages, we show the change in people’s option on covid-19 from being a conspiracy, to the post-covid topics that surround vaccination.