Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals

Efstathia Soufleri, Sophia Ananiadou


Abstract
Social media platforms generate an enormous volume of multi-modal data, yet stress detection research has predominantly relied on text-based analysis. In this work, we propose a novel framework that integrates textual content with synthesized visual cues to enhance stress detection. Using the generative model DALL·E, we synthesize images from social media posts, which are then fused with text through the multi-modal capabilities of a pre-trained CLIP model. Our approach is evaluated on the Dreaddit dataset, where a classifier trained on frozen CLIP features achieves 94.90% accuracy, and full fine-tuning further improves performance to 98.41%. These results underscore the integration of synthesized visuals with textual data not only enhances stress detection but also offers a robust method over traditional text-only methods, paving the way for innovative approaches in mental health monitoring and social media analytics.
Anthology ID:
2025.bionlp-1.4
Volume:
ACL 2025
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–43
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.4/
DOI:
Bibkey:
Cite (ACL):
Efstathia Soufleri and Sophia Ananiadou. 2025. Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals. In ACL 2025, pages 34–43, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals (Soufleri & Ananiadou, BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.4.pdf
Supplementarymaterial:
 2025.bionlp-1.4.SupplementaryMaterial.txt