Ahmed A Aly
2026
CoSy: Conversational Synthesis for Grounded Question Answering
Patrick Huber | Arash Einolghozati | Rylan Conway | Kanika Narang | Matt Smith | Waqar Nayyar | Adithya Sagar | Ahmed A Aly | Akshat Shrivastava
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Patrick Huber | Arash Einolghozati | Rylan Conway | Kanika Narang | Matt Smith | Waqar Nayyar | Adithya Sagar | Ahmed A Aly | Akshat Shrivastava
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
High-quality, large-scale conversational datasets are scarce, making it difficult to train on-device language models (~1B parameters) as effective assistants. We introduce CoSy (Conversational Synthesis), a novel framework for generating diverse, steerable, multi-turn conversations at scale. CoSY combines three key mechanisms: (1) conversational graphs that ensure natural dialogue flow, (2) turn-based prompt augmentations for diversity, and (3) explicit linguistic phenomena for coherence. We evaluate CoSy on conversational grounded reasoning tasks (i.e. answering questions based on contextual information), a core on-device use case.Our on-device sized models trained on CoSy-synthesized data achieve competitive performance with human-annotated baselines and outperform instruction-tuned models of up to 70B parameters in zero-shot settings.
Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning
Minseok Kim | Jingxiang Chen | Seong-Gyun Leem | Yin Huang | Rashi Rungta | Zhicheng Ouyang | Haibin Wu | Surya Teja Appini | Ankur Bansal | Yang Bai | Yue Liu | Florian Metze | Ahmed A Aly | Anuj Kumar | Ariya Rastrow | Zhaojiang Lin
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Minseok Kim | Jingxiang Chen | Seong-Gyun Leem | Yin Huang | Rashi Rungta | Zhicheng Ouyang | Haibin Wu | Surya Teja Appini | Ankur Bansal | Yang Bai | Yue Liu | Florian Metze | Ahmed A Aly | Anuj Kumar | Ariya Rastrow | Zhaojiang Lin
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Speech large language models (LLMs) observe paralinguistic cues such as prosody, emotion, and non-verbal sounds—crucial for intent understanding. However, leveraging these cues faces challenges: limited training data, annotation difficulty, and models exploiting lexical shortcuts over paralinguistic signals. We propose multi-task reinforcement learning (RL) with chain-of-thought prompting that elicits explicit affective reasoning. To address data scarcity, we introduce a paralinguistics-aware speech LLM (PALLM) that jointly optimizes sentiment classification from audio and paralinguistics-aware response generation via a two-stage pipeline. Experiments demonstrate that our approach improves paralinguistics understanding over both supervised baselines and strong proprietary models (Gemini-2.5-Pro, GPT-4o-audio), by 8-12% on Expresso, IEMOCAP, and RAVDESS. The results show that modeling paralinguistic reasoning with multi-task RL is crucial for building emotionally intelligent speech LLMs.
CoSMoEs: Compact Sparse Mixture of Experts
Patrick Huber | Akshat Shrivastava | Ernie Chang | Chinnadhurai Sankar | Ahmed A Aly | Adithya Sagar
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Patrick Huber | Akshat Shrivastava | Ernie Chang | Chinnadhurai Sankar | Ahmed A Aly | Adithya Sagar
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Sparse Mixture of Expert (MoE) models are widely used foundation architectures at large scale, yet remain under-explored at smaller sizes. In this work, we introduce Compact Sparse Mixture of Experts (CoSMoEs) for on-device inference, addressing three key challenges: Quality, Memory, and Latency. On the quality front, we conduct a fair evaluation (removing confounding factors) and show that MoE architectures outperform dense models at on-device scale. We further propose weight-decomposed experts, which improve MoE performance beyond the standard formulation. On the memory and latency front, we address the prohibitively large parameter count of MoE models by improving expert offloading efficiency through a novel training-time loss, reducing inference latency for on-device deployment
2024
PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding
Trang Le | Daniel Lazar | Suyoun Kim | Shan Jiang | Duc Le | Adithya Sagar | Aleksandr Livshits | Ahmed A Aly | Akshat Shrivastava
Findings of the Association for Computational Linguistics: EMNLP 2024
Trang Le | Daniel Lazar | Suyoun Kim | Shan Jiang | Duc Le | Adithya Sagar | Aleksandr Livshits | Ahmed A Aly | Akshat Shrivastava
Findings of the Association for Computational Linguistics: EMNLP 2024
Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions of autoregressive deliberation systems. We further show that the design of the denoising training allows PRoDeliberation to overcome the limitations of small ASR devices, and we provide analysis on the necessity of each component of the system.
Search
Fix author
Co-authors
- Adithya Sagar 3
- Akshat Shrivastava 3
- Patrick Huber 2
- Surya Teja Appini 1
- Yang Bai 1
- Ankur Bansal 1
- Ernie Chang 1
- Jingxiang Chen 1
- Rylan Conway 1
- Arash Einolghozati 1
- Yin Huang 1
- Shan Jiang 1
- Minseok Kim 1
- Suyoun Kim 1
- Anuj Kumar 1
- Daniel Lazar 1
- Duc Le 1
- Trang Le 1
- Seong-Gyun Leem 1
- Zhaojiang Lin 1
- Yue Liu 1
- Aleksandr Livshits 1
- Florian Metze 1
- Kanika Narang 1
- Waqar Nayyar 1
- Zhicheng Ouyang 1
- Ariya Rastrow 1
- Rashi Rungta 1
- Chinnadhurai Sankar 1
- Matt Smith 1
- Haibin Wu 1