Anchoring the Affective Manifold: Learning Canonical and Disentangled Representations via Generative Cross-Modal Alignment

Weibin Li; Jintao Cheng; Xiaoyu Tang; Chi Man Vong

Anchoring the Affective Manifold: Learning Canonical and Disentangled Representations via Generative Cross-Modal Alignment

Weibin Li, Jintao Cheng, Xiaoyu Tang, Chi Man Vong

Abstract

Dominant multimodal emotion recognition paradigms often neglect the intrinsic geometric structure of affect, resulting in representations heavily entangled with non-affective factors. To address this, we propose a Canonical Disentangled Multimodal Generative Framework aimed at recovering the canonical affective manifold from raw data. We explicitly decompose the latent space into a canonical Shared Affective Subspace (z_vad) and a Private Modality Subspace (z_priv). We facilitate this factorization through Supervised Manifold Anchoring and Cross-Modal Manifold Alignment. Experiments demonstrate that our model effectively disentangles affect from private attributes (e.g., identity), achieving superior robustness in zero-shot cross-domain transfer compared to fully supervised baselines, while enabling controllable emotion generation.

Anthology ID:: 2026.acl-long.1929
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41605–41614
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1929/
DOI:
Bibkey:
Cite (ACL):: Weibin Li, Jintao Cheng, Xiaoyu Tang, and Chi Man Vong. 2026. Anchoring the Affective Manifold: Learning Canonical and Disentangled Representations via Generative Cross-Modal Alignment. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41605–41614, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Anchoring the Affective Manifold: Learning Canonical and Disentangled Representations via Generative Cross-Modal Alignment (Li et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1929.pdf
Checklist:: 2026.acl-long.1929.checklist.pdf

PDF Cite Search Checklist Fix data