MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos

Cennet Oguz; Pascal Denis; Simon Ostermann; Emmanuel Vincent; Natalia Skachkova; Josef van Genabith

doi:10.18653/v1/2024.findings-emnlp.88

MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos

Cennet Oguz, Pascal Denis, Simon Ostermann, Emmanuel Vincent, Natalia Skachkova, Josef Van Genabith

Abstract

Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross- and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ~10 % for unseen languages.

Anthology ID:: 2024.findings-emnlp.88
Original:: 2024.findings-emnlp.88v1
Version 2:: 2024.findings-emnlp.88v2
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1618–1633
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.88/
DOI:: 10.18653/v1/2024.findings-emnlp.88
Bibkey:
Cite (ACL):: Cennet Oguz, Pascal Denis, Simon Ostermann, Emmanuel Vincent, Natalia Skachkova, and Josef Van Genabith. 2024. MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1618–1633, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos (Oguz et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.88.pdf
Data:: 2024.findings-emnlp.88.data.zip

PDF (v2) PDF (v1) Cite Search Data Fix data