Ojasv Kamal

2024

pdf abs
Moûsai: Efficient Text-to-Music Diffusion Models
Flavio Schneider | Ojasv Kamal | Zhijing Jin | Bernhard Schölkopf
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent years have seen the rapid development of large generative models for text; however, much less research has explored the connection between text and another “language” of communication – music. Music, much like text, can convey emotions, stories, and ideas, and has its own unique structure and syntax. In our work, we bridge text and music via a text-to-music generation model that is highly efficient, expressive, and can handle long-term structure. Specifically, we develop Moûsai, a cascading two-stage latent diffusion model that can generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Moreover, our model features high efficiency, which enables real-time inference on a single consumer GPU with a reasonable speed. Through experiments and property analyses, we show our model’s competence over a variety of criteria compared with existing music generation models. Lastly, to promote the open-source culture, we provide a collection of open-source libraries with the hope of facilitating future work in the field. We open-source the following: Codes: https://github.com/archinetai/audio-diffusion-pytorch. Music samples for this paper: http://bit.ly/44ozWDH. Music samples for all models: https://bit.ly/audio-diffusion.

2021

pdf abs
Adversities are all you need: Classification of self-reported breast cancer posts on Twitter using Adversarial Fine-tuning
Adarsh Kumar | Ojasv Kamal | Susmita Mazumdar
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

In this paper, we describe our system entry for Shared Task 8 at SMM4H-2021, which is on automatic classification of self-reported breast cancer posts on Twitter. In our system, we use a transformer-based language model fine-tuning approach to automatically identify tweets in the self-reports category. Furthermore, we involve a Gradient-based Adversarial fine-tuning to improve the overall model’s robustness. Our system achieved an F1-score of 0.8625 on the Development set and 0.8501 on the Test set in Shared Task-8 of SMM4H-2021.

Co-authors

Venues

smm4h1
acl1