Hrituraj Singh


2021

pdf
DRAG: Director-Generator Language Modelling Framework for Non-Parallel Author Stylized Rewriting
Hrituraj Singh | Gaurav Verma | Aparna Garimella | Balaji Vasan Srinivasan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Author stylized rewriting is the task of rewriting an input text in a particular author’s style. Recent works in this area have leveraged Transformer-based language models in a denoising autoencoder setup to generate author stylized text without relying on a parallel corpus of data. However, these approaches are limited by the lack of explicit control of target attributes and being entirely data-driven. In this paper, we propose a Director-Generator framework to rewrite content in the target author’s style, specifically focusing on certain target attributes. We show that our proposed framework works well even with a limited-sized target author corpus. Our experiments on corpora consisting of relatively small-sized text authored by three distinct authors show significant improvements upon existing works to rewrite input texts in target author’s style. Our quantitative and qualitative analyses further show that our model has better meaning retention and results in more fluent generations.

pdf
MIMOQA: Multimodal Input Multimodal Output Question Answering
Hrituraj Singh | Anshul Nasery | Denil Mehta | Aishwarya Agarwal | Jatin Lamba | Balaji Vasan Srinivasan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Multimodal research has picked up significantly in the space of question answering with the task being extended to visual question answering, charts question answering as well as multimodal input question answering. However, all these explorations produce a unimodal textual output as the answer. In this paper, we propose a novel task - MIMOQA - Multimodal Input Multimodal Output Question Answering in which the output is also multimodal. Through human experiments, we empirically show that such multimodal outputs provide better cognitive understanding of the answers. We also propose a novel multimodal question-answering framework, MExBERT, that incorporates a joint textual and visual attention towards producing such a multimodal output. Our method relies on a novel multimodal dataset curated for this problem from publicly available unimodal datasets. We show the superior performance of MExBERT against strong baselines on both the automatic as well as human metrics.

2020

pdf
Incorporating Stylistic Lexical Preferences in Generative Language Models
Hrituraj Singh | Gaurav Verma | Balaji Vasan Srinivasan
Findings of the Association for Computational Linguistics: EMNLP 2020

While recent advances in language modeling has resulted in powerful generation models, their generation style remains implicitly dependent on the training data and can not emulate a specific target style. Leveraging the generative capabilities of a transformer-based language models, we present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lexical preferences of an author into generative language models. We introduce rewarding strategies in a reinforcement learning framework that encourages the use of words across multiple categorical dimensions, to varying extents. Our experiments demonstrate that the proposed approach can generate text that distinctively aligns with a given target author’s lexical style. We conduct quantitative and qualitative comparisons with competitive and relevant baselines to illustrate the benefits of the proposed approach.

pdf
STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering
Hrituraj Singh | Sumit Shekhar
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Chart Question Answering (CQA) is the task of answering natural language questions about visualisations in the chart image. Recent solutions, inspired by VQA approaches, rely on image-based attention for question/answering while ignoring the inherent chart structure. We propose STL-CQA which improves the question/answering through sequential elements localization, question encoding and then, a structural transformer-based learning approach. We conduct extensive experiments while proposing pre-training tasks, methodology and also an improved dataset with more complex and balanced questions of different types. The proposed methodology shows a significant accuracy improvement compared to the state-of-the-art approaches on various chart Q/A datasets, while outperforming even human baseline on the DVQA Dataset. We also demonstrate interpretability while examining different components in the inference pipeline.