MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
Cennet Oguz, Pascal Denis, Simon Ostermann, Emmanuel Vincent, Natalia Skachkova, Josef Van Genabith
Abstract
Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross- and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ~10 % for unseen languages.- Anthology ID:
- 2024.findings-emnlp.88
- Original:
- 2024.findings-emnlp.88v1
- Version 2:
- 2024.findings-emnlp.88v2
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1618–1633
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.88/
- DOI:
- 10.18653/v1/2024.findings-emnlp.88
- Cite (ACL):
- Cennet Oguz, Pascal Denis, Simon Ostermann, Emmanuel Vincent, Natalia Skachkova, and Josef Van Genabith. 2024. MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1618–1633, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos (Oguz et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.88.pdf