Hasindri Watawana
Also published as: Hasindri Sankalpana Watawana
2026
When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews
Hasindri Sankalpana Watawana | Sergio Gastón Burdisso | Diego Aaron Moreno-Galvan | Fernando Sanchez-Vega | Adrian Pastor Lopez Monroy | Petr Motlicek | Esau Villatoro-Tello
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Hasindri Sankalpana Watawana | Sergio Gastón Burdisso | Diego Aaron Moreno-Galvan | Fernando Sanchez-Vega | Adrian Pastor Lopez Monroy | Petr Motlicek | Esau Villatoro-Tello
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Automatic depression detection from doctor–patient conversations has gained momentum thanks to the availability of public corpora and advances in language modeling. However, interpretability remains limited: strong performance is often reported without revealing what drives predictions. We analyze three datasets—ANDROIDS, DAIC-WOZ, and E-DAIC—and identify a systematic bias from interviewer prompts in semi-structured interviews. Models trained on interviewer turns exploit fixed prompts and positions to distinguish depressed from control subjects, often achieving high classification scores without using participant language. Restricting models to participant utterances distributes decision evidence more broadly and reflects genuine linguistic cues. While semi-structured protocols ensure consistency, including interviewer prompts inflates performance by leveraging script artifacts. Our results highlight a cross-dataset, architecture-agnostic bias and emphasize the need for analyses that localize decision evidence by time and speaker to ensure models learn from participants’ language.
2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
Bhuiyan Sanjid Shafique | Ashmal Vayani | Muhammad Maaz | Hanoona Abdul Rasheed | Dinura Dissanayake | Mohammed Irfan Kurpath | Yahya Hmaiti | Go Inoue | Jean Lahoud | Md. Safirur Rashid | Shadid Intisar Quasem | Maheen Fatima | Franco Vidal | Mykola Maslych | Ketan Pravin More | Sanoojan Baliah | Hasindri Watawana | Yuhao Li | Fabian Farestam | Leon Schaller | Roman Tymtsiv | Simon Weber | Hisham Cholakkal | Ivan Laptev | Shin’ichi Satoh | Michael Felsberg | Mubarak Shah | Salman Khan | Fahad Shahbaz Khan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Bhuiyan Sanjid Shafique | Ashmal Vayani | Muhammad Maaz | Hanoona Abdul Rasheed | Dinura Dissanayake | Mohammed Irfan Kurpath | Yahya Hmaiti | Go Inoue | Jean Lahoud | Md. Safirur Rashid | Shadid Intisar Quasem | Maheen Fatima | Franco Vidal | Mykola Maslych | Ketan Pravin More | Sanoojan Baliah | Hasindri Watawana | Yuhao Li | Fabian Farestam | Leon Schaller | Roman Tymtsiv | Simon Weber | Hisham Cholakkal | Ivan Laptev | Shin’ichi Satoh | Michael Felsberg | Mubarak Shah | Salman Khan | Fahad Shahbaz Khan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large multimodal models (LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English language. While few recent works explore multilingual image LMMs, to the best of our knowledge, moving beyond the English language for cultural and linguistic inclusivity is yet to be investigated in the context of video LMMs. In pursuit of more inclusive video LMMs, we introduce a multilingual Video LMM benchmark, named ViMUL-Bench, to evaluate Video LMMs across 14 languages, including both low- and high-resource languages: Arabic, Bengali, Chinese, English, French, German, Hindi, Japanese, Russian, Sinhala, Spanish, Swedish, Tamil, and Urdu. Our ViMUL-Bench is designed to rigorously test video LMMs across 15 categories including eight culturally diverse categories, ranging from lifestyles and festivals to foods and rituals and from local landmarks to prominent cultural personalities. ViMUL-Bench comprises both open-ended (short and long-form) and multiple-choice questions spanning various video durations (short, medium, and long) with 8k samples that are manually verified by native language speakers. In addition, we also introduce a machine translated multilingual video training set comprising 1.2 million samples and develop a simple multilingual video LMM, named ViMUL, that is shown to provide a better tradeoff between high-and low-resource languages for video understanding. We hope our ViMUL-Bench and multilingual video LMM along with a large-scale multilingual video training set will help ease future research in developing cultural and linguistic inclusive multilingual video LMMs. Our proposed benchmark, video LMM and training data will be publicly released.
Search
Fix author
Co-authors
- Sanoojan Baliah 1
- Sergio Gastón Burdisso 1
- Hisham Cholakkal 1
- Dinura Dissanayake 1
- Fabian Farestam 1
- Maheen Fatima 1
- Michael Felsberg 1
- Yahya Hmaiti 1
- Go Inoue 1
- Salman Khan 1
- Fahad Shahbaz Khan 1
- Mohammed Irfan Kurpath 1
- Jean Lahoud 1
- Ivan Laptev 1
- Yuhao Li 1
- Adrian Pastor Lopez Monroy 1
- Muhammad Maaz 1
- Mykola Maslych 1
- Ketan Pravin More 1
- Diego Aaron Moreno-Galvan 1
- Petr Motlicek 1
- Shadid Intisar Quasem 1
- Hanoona Abdul Rasheed 1
- Md. Safirur Rashid 1
- Fernando Sanchez-Vega 1
- Shin’ichi Satoh 1
- Leon Schaller 1
- Bhuiyan Sanjid Shafique 1
- Mubarak Shah 1
- Roman Tymtsiv 1
- Ashmal Vayani 1
- Franco Vidal 1
- Esaú Villatoro-tello 1
- Simon Weber 1