TiC-MuFormer: Time-Aware Caption-Integrated Multimodal Transformers for User-Level Mental Health Modeling

Georgios Tsoumplekas, Yannis Spyridis, Vasileios Argyriou


Abstract
User-level affective modeling from social media requires integrating heterogeneous signals that unfold over time. While prior work has focused predominantly on textual analysis, visually expressed affect and temporal posting patterns also carry important psychological cues. However, these modalities are difficult to combine in practice due to sparse emotional evidence, asynchronous posting behavior, and frequent semantic misalignment between images and accompanying text. This paper introduces TiC-MuFormer, a time-enriched caption-integrated multimodal transformer that addresses these challenges by verbalizing visual content through image captioning before fusion and injecting temporal structure prior to cross-modal attention, enabling user trajectories to be modeled in a time-aware semantic space. We instantiate the method on a mental health detection task and demonstrate that it achieves state-of-the-art results across all user-level metrics, outperforming both unimodal and multimodal baselines. Ablation studies further show that temporal coverage, batch size and encoder choice jointly influence downstream accuracy, underscoring the importance of aligned temporal and semantic representations. Overall, this work highlights caption-guided temporal multimodality as a principled modeling strategy for general affective or psychiatric risk inference in social platforms.
Anthology ID:
2026.lrec-main.759
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
9669–9677
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.759/
DOI:
Bibkey:
Cite (ACL):
Georgios Tsoumplekas, Yannis Spyridis, and Vasileios Argyriou. 2026. TiC-MuFormer: Time-Aware Caption-Integrated Multimodal Transformers for User-Level Mental Health Modeling. International Conference on Language Resources and Evaluation, main:9669–9677.
Cite (Informal):
TiC-MuFormer: Time-Aware Caption-Integrated Multimodal Transformers for User-Level Mental Health Modeling (Tsoumplekas et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.759.pdf