Mitigating Interviewer Bias in Multimodal Depression Detection: An Approach with Adversarial Learning and Contextual Positional Encoding

Enshi Zhang, Christian Poellabauer


Abstract
Clinical interviews are a standard method for assessing depression. Recent approaches have improved prediction accuracy by focusing on specific questions posed by the interviewer and manually selected question-answer (QA) pairs that target mental health content. However, these methods often neglect the broader conversational context, resulting in limited generalization and reduced robustness, particularly in less structured interviews, which are common in real-world clinical settings. In this work, we develop a multimodal dialogue-level transformer that captures the dynamics of dialogue within each interview by using a combination of sequential positional embedding and question context vectors. In addition to the depression prediction branch, we build an adversarial classifier with a gradient reversal layer to learn shared representations that remain invariant to the types of questions asked during the interview. This approach aims to reduce biased learning and improve the fairness and generalizability of depression detection in diverse clinical interview scenarios. Classification and regression experiments conducted on three real-world interview-based datasets and one synthetic dataset demonstrate the robustness and generalizability of our model.
Anthology ID:
2025.findings-emnlp.650
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12169–12188
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.650/
DOI:
10.18653/v1/2025.findings-emnlp.650
Bibkey:
Cite (ACL):
Enshi Zhang and Christian Poellabauer. 2025. Mitigating Interviewer Bias in Multimodal Depression Detection: An Approach with Adversarial Learning and Contextual Positional Encoding. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 12169–12188, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Mitigating Interviewer Bias in Multimodal Depression Detection: An Approach with Adversarial Learning and Contextual Positional Encoding (Zhang & Poellabauer, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.650.pdf
Checklist:
 2025.findings-emnlp.650.checklist.pdf