Gaurav Sahu

2021

pdf bib abs
Adaptive Fusion Techniques for Multimodal Data
Gaurav Sahu | Olga Vechtomova
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data. In this paper, we propose adaptive fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide “how” to combine a given set of multimodal features more effectively. We propose two networks: 1) Auto-Fusion, which learns to compress information from different modalities while preserving the context, and 2) GAN-Fusion, which regularizes the learned latent space given context from complementing modalities. A quantitative evaluation on the tasks of multimodal machine translation and emotion recognition suggests that our lightweight, adaptive networks can better model context from other modalities than existing methods, many of which employ massive transformer-based networks.

2020

pdf bib
Generation of lyrics lines conditioned on music audio clips
Olga Vechtomova | Gaurav Sahu | Dhruv Kumar
Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA)

pdf bib abs
Adversarial Learning on the Latent Space for Diverse Dialog Generation
Kashif Khan | Gaurav Sahu | Vikash Balasubramanian | Lili Mou | Olga Vechtomova
Proceedings of the 28th International Conference on Computational Linguistics

Generating relevant responses in a dialog is challenging, and requires not only proper modeling of context in the conversation, but also being able to generate fluent sentences during inference. In this paper, we propose a two-step framework based on generative adversarial nets for generating conditioned responses. Our model first learns a meaningful representation of sentences by autoencoding, and then learns to map an input query to the response representation, which is in turn decoded as a response sentence. Both quantitative and qualitative evaluations show that our model generates more fluent, relevant, and diverse responses than existing state-of-the-art methods.

2018

pdf bib abs
Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit
Amrith Krishna | Bishal Santra | Sasi Prasanth Bandaru | Gaurav Sahu | Vishnu Dutt Sharma | Pavankumar Satuluri | Pawan Goyal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The configurational information in sentences of a free word order language such as Sanskrit is of limited use. Thus, the context of the entire sentence will be desirable even for basic processing tasks such as word segmentation. We propose a structured prediction framework that jointly solves the word segmentation and morphological tagging tasks in Sanskrit. We build an energy based model where we adopt approaches generally employed in graph based parsing techniques (McDonald et al., 2005a; Carreras, 2007). Our model outperforms the state of the art with an F-Score of 96.92 (percentage improvement of 7.06%) while using less than one tenth of the task-specific training data. We find that the use of a graph based approach instead of a traditional lattice-based sequential labelling approach leads to a percentage gain of 12.6% in F-Score for the segmentation task.

Co-authors

Pavankumar Satuluri 1

Pawan Goyal 1

Dhruv Kumar 1

Kashif Khan 1

Vikash Balasubramanian 1

Lili Mou 1

Gaurav Sahu

2021

2020

2018

Co-authors

Venues