Hamid Behroozi


2025

pdf bib
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description
Mahshid Dehghani | Amirahmad Shafiee | Ali Shafiei | Neda Fallah | Farahmand Alizadeh | Mohammad Mehdi Gholinejad | Hamid Behroozi | Jafar Habibi | Ehsaneddin Asgari
Findings of the Association for Computational Linguistics: NAACL 2025

3D facial emotion modeling has important applications in areas such as animation design, virtual reality, and emotional human-computer interaction (HCI). However, existing models are constrained by limited emotion classes and insufficient datasets. To address this, we introduce Emo3D, an extensive “Text-Image-Expression dataset” that spans a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, enabling the capture of a broad range of emotional expressions. Using this unique dataset, we perform a comprehensive evaluation of fine-tuned language-based models and vision-language models, such as Contrastive Language-Image Pretraining (CLIP), for 3D facial expression synthesis. To better assess conveyed emotions, we introduce Emo3D metric, a new evaluation metric that aligns more closely with human perception than traditional Mean Squared Error (MSE). Unlike MSE, which focuses on numerical differences, Emo3D captures emotional nuances in visual-text alignment and semantic richness. Emo3D dataset and metric hold great potential for advancing applications in animation and virtual reality.