Abstract
The multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels – entity-level insertion, projection and concretization and scene-level restatement and extension — are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing over 16K multimedia tweets with manually annotated discourse labels. The experimental results show that trendy multimedia encoders based on multi-head attention (with captions) are unable to well understand cross-modality discourse and additionally modeling texts at the output layer helps yield the-state-of-the-art results.- Anthology ID:
- 2022.findings-emnlp.182
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2459–2471
- Language:
- URL:
- https://aclanthology.org/2022.findings-emnlp.182
- DOI:
- Cite (ACL):
- Chunpu Xu, Hanzhuo Tan, Jing Li, and Piji Li. 2022. Understanding Social Media Cross-Modality Discourse in Linguistic Space. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2459–2471, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Understanding Social Media Cross-Modality Discourse in Linguistic Space (Xu et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.findings-emnlp.182.pdf