Efstathia Soufleri


2025

pdf bib
Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals
Efstathia Soufleri | Sophia Ananiadou
ACL 2025

Social media platforms generate an enormous volume of multi-modal data, yet stress detection research has predominantly relied on text-based analysis. In this work, we propose a novel framework that integrates textual content with synthesized visual cues to enhance stress detection. Using the generative model DALL·E, we synthesize images from social media posts, which are then fused with text through the multi-modal capabilities of a pre-trained CLIP model. Our approach is evaluated on the Dreaddit dataset, where a classifier trained on frozen CLIP features achieves 94.90% accuracy, and full fine-tuning further improves performance to 98.41%. These results underscore the integration of synthesized visuals with textual data not only enhances stress detection but also offers a robust method over traditional text-only methods, paving the way for innovative approaches in mental health monitoring and social media analytics.