Rasa Hosseinzadeh


2025

pdf bib
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
Satya Krishna Gorti | Ilan Gofman | Zhaoyan Liu | Jiapeng Wu | Noël Vouitsis | Guangwei Yu | Jesse C. Cresswell | Rasa Hosseinzadeh
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Text-to-SQL generation enables non-experts to interact with databases via natural language. Recent advances rely on large closed-source models like GPT-4 that present challenges in accessibility, privacy, and latency. To address these issues, we focus on developing small, efficient, and open-source text-to-SQL models. We demonstrate the benefits of sampling multiple candidate SQL generations and propose our method, MSc-SQL, to critique them using associated metadata. Our sample critiquing model evaluates multiple outputs simultaneously, achieving state-of-the-art performance compared to other open-source models while remaining competitive with larger models at a much lower cost. Full code can be found at github.com/layer6ai-labs/msc-sql.

2023

pdf bib
DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation
Sajad Norouzi | Rasa Hosseinzadeh | Felipe Perez | Maksims Volkovs
Findings of the Association for Computational Linguistics: ACL 2023

The computational benefits of iterative non-autoregressive transformers decrease as the number of decoding steps increases. As a remedy, we introduce Distill Multiple Steps (DiMS), a simple yet effective distillation technique to decrease the number of required steps to reach a certain translation quality. The distilled model enjoys the computational benefits of early iterations while preserving the enhancements from several iterative steps. DiMS relies on two models namely student and teacher. The student is optimized to predict the output of the teacher after multiple decoding steps while the teacher follows the student via a slow-moving average. The moving average keeps the teacher’s knowledge updated and enhances the quality of the labels provided by the teacher. During inference, the student is used for translation and no additional computation is added. We verify the effectiveness of DiMS on various models obtaining 7.8 and 12.9 BLEU points improvements in single-step translation accuracy on distilled and raw versions of WMT’14 De-En.Full code for this work is available here: https://github.com/layer6ai-labs/DiMS