Tahseen Rabbani
2026
PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
Afra Feyza Akyürek | Advait Gosai | Chen Bo Calvin Zhang | Vipul Gupta | Jaehwan Jeong | Anisha Gunjal | Tahseen Rabbani | Maria Mazzone | David Randolph IV | Mohammad Mahmoudi Meymand | Gurshaan Chattha | Paula Rodriguez | Diego A. Mares Buendia | Pavit Singh | Michael Liu | Subodh Chawla | Peter Cline | Lucy Ogaz | Ernesto Gabriel Hernández Montoya | Zihao Wang | Pavi Bhatter | Marcos Ayestaran | Bing Liu | Yunzhong He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Afra Feyza Akyürek | Advait Gosai | Chen Bo Calvin Zhang | Vipul Gupta | Jaehwan Jeong | Anisha Gunjal | Tahseen Rabbani | Maria Mazzone | David Randolph IV | Mohammad Mahmoudi Meymand | Gurshaan Chattha | Paula Rodriguez | Diego A. Mares Buendia | Pavit Singh | Michael Liu | Subodh Chawla | Peter Cline | Lucy Ogaz | Ernesto Gabriel Hernández Montoya | Zihao Wang | Pavi Bhatter | Marcos Ayestaran | Bing Liu | Yunzhong He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Frontier model progress is often measured using academic benchmarks that provide a limited view of performance on open-ended, economically consequential tasks in high-stakes professional domains where practical returns matter most. We introduce Professional Reasoning Bench (PRBench), a realistic, open-ended, and difficult benchmark of real-world problems in Finance and Law. We open-source its 1,100 expert-authored tasks and 19,356 expert-curated criteria, making it the largest public, rubric-based benchmark for both legal and finance domains. We recruit 182 qualified professionals, holding JDs, CFAs, or 6+ years of experience, who contributed questions inspired by their actual workflows. This process yields significant diversity, with tasks spanning 114 countries and 47 US jurisdictions. Our expert-curated rubrics are validated through a rigorous quality pipeline, including independent expert validation. Subsequent evaluation of 20 leading models reveals substantial room for improvement, with top scores of only 0.39 (Finance) and 0.37 (Legal) on our Hard subsets. We further catalog associated economic impacts of the prompts and analyze performance using human-annotated rubric categories. Common failure modes include inaccurate judgments, a lack of process transparency and incomplete reasoning, highlighting critical gaps in their reliability for professional adoption.
2025
Federated Meta-Learning for Low-Resource Translation of Kirundi
Kyle Rui Sang | Tahseen Rabbani | Tianyi Zhou
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Kyle Rui Sang | Tahseen Rabbani | Tianyi Zhou
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
In this work, we reframe multilingual neural machine translation (NMT) as a federated meta-learning problem and introduce a translation dataset for the low-resource Kirundi language. We aggregate machine translation models () locally trained on varying (but related) source languages to produce a global meta-model that encodes abstract representations of key semantic structures relevant to the parent languages. We then use the Reptile algorithm and Optuna fine-tuning to fit the global model onto a target language. The target language may live outside the subset of parent languages (such as closely-related dialects or sibling languages), which is particularly useful for languages with limitedly available sentence pairs. We first develop a novel dataset of Kirundi-English sentence pairs curated from Biblical translation. We then demonstrate that a federated learning approach can produce a tiny 4.8M Kirundi translation model and a stronger NLLB-600M model which performs well on both our Biblical corpus and the FLORES-200 Kirundi corpus.
Assessing the Similarity of Cross-Lingual Seq2Seq Sentence Embeddings Using Low-Resource Spectral Clustering
Nelson Moll | Tahseen Rabbani
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Nelson Moll | Tahseen Rabbani
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
In this work, we study the cross-lingual distance of machine translations through alignment of seq2seq representations over small corpora. First, we use the M2M100 model to collect sentence-level representations of The Book of Revelation in several languages. We then perform unsupervised manifold alignment (spectral clustering) between these collections of embeddings. As verses between translations are not necessarily aligned, our procedure falls under the challenging, but more realistic non-correspondence regime. The cost function associated with each alignment is used to rank the relative (machine) similarity of one language to another. We then perform correspondent alignment over another cluster of languages, this time using FLORES+ parallel NLLB model embeddings. Our experiments demonstrate that the representations of closely-related languages group closely, and are cheap to align (requiring <1000 sentences) via our strategy.
Search
Fix author
Co-authors
- Afra Feyza Akyürek 1
- Marcos Ayestaran 1
- Pavi Bhatter 1
- Diego A. Mares Buendia 1
- Gurshaan Chattha 1
- Subodh Chawla 1
- Peter Cline 1
- Advait Gosai 1
- Anisha Gunjal 1
- Vipul Gupta 1
- Yunzhong He 1
- David Randolph IV 1
- Jaehwan Jeong 1
- Bing Liu 1
- Michael Liu 1
- Maria Mazzone 1
- Mohammad Mahmoudi Meymand 1
- Nelson Moll 1
- Ernesto Gabriel Hernández Montoya 1
- Lucy Ogaz 1
- Paula Rodriguez 1
- Kyle Rui Sang 1
- Pavit Singh 1
- Zihao Wang 1
- Chen Bo Calvin Zhang 1
- Tianyi Zhou 1