HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals

Guimin Hu, Daniel Hershcovich, Hasti Seifi


Abstract
Haptic signals, from smartphone vibrations to virtual reality touch feedback, can effectively convey information and enhance realism, but designing signals that resonate meaningfully with users is challenging. To facilitate this, we introduce a multimodal dataset and task, of matching user descriptions to vibration haptic signals, and highlight two primary challenges: (1) lack of large haptic vibration datasets annotated with textual descriptions as collecting haptic descriptions is time-consuming, and (2) limited capability of existing tasks and models to describe vibration signals in text.To advance this area, we create HapticCap, the first fully human-annotated haptic-captioned dataset, containing 92,070 haptic-text pairs for user descriptions of sensory, emotional, and associative attributes of vibrations. Based on HapticCap, we propose the haptic-caption retrieval task and present the results of this task from a supervised contrastive learning framework that brings together text representations within specific categories and vibrations. Overall, the combination of language model T5 and audio model AST yields the best performance in the haptic-caption retrieval task, especially when separately trained for each description category. The dataset is available at https://huggingface.co/datasets/GuiminHu/HapticCap.
Anthology ID:
2025.findings-emnlp.781
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14473–14489
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.781/
DOI:
10.18653/v1/2025.findings-emnlp.781
Bibkey:
Cite (ACL):
Guimin Hu, Daniel Hershcovich, and Hasti Seifi. 2025. HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14473–14489, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals (Hu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.781.pdf
Checklist:
 2025.findings-emnlp.781.checklist.pdf