ID10M: Idiom Identification in 10 Languages

Simone Tedeschi; Federico Martelli; Roberto Navigli

doi:10.18653/v1/2022.findings-naacl.208

ID10M: Idiom Identification in 10 Languages

Simone Tedeschi, Federico Martelli, Roberto Navigli

Abstract

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely under-investigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https://github.com/Babelscape/ID10M.

Anthology ID:: 2022.findings-naacl.208
Volume:: Findings of the Association for Computational Linguistics: NAACL 2022
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2715–2726
Language:
URL:: https://aclanthology.org/2022.findings-naacl.208
DOI:: 10.18653/v1/2022.findings-naacl.208
Bibkey:
Cite (ACL):: Simone Tedeschi, Federico Martelli, and Roberto Navigli. 2022. ID10M: Idiom Identification in 10 Languages. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2715–2726, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: ID10M: Idiom Identification in 10 Languages (Tedeschi et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2022.findings-naacl.208.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-4/2022.findings-naacl.208.mp4
Code: babelscape/id10m
Data: WikiMatrix

PDF Search Code Video