Idiomatic Expression Identification using Semantic Compatibility

Ziheng Zeng, Suma Bhat


Abstract
Idiomatic expressions are an integral part of natural language and constantly being added to a language. Owing to their non-compositionality and their ability to take on a figurative or literal meaning depending on the sentential context, they have been a classical challenge for NLP systems. To address this challenge, we study the task of detecting whether a sentence has an idiomatic expression and localizing it when it occurs in a figurative sense. Prior research for this task has studied specific classes of idiomatic expressions offering limited views of their generalizability to new idioms. We propose a multi-stage neural architecture with attention flow as a solution. The network effectively fuses contextual and lexical information at different levels using word and sub-word representations. Empirical evaluations on three of the largest benchmark datasets with idiomatic expressions of varied syntactic patterns and degrees of non-compositionality show that our proposed model achieves new state-of-the-art results. A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines on the largest dataset.
Anthology ID:
2021.tacl-1.92
Volume:
Transactions of the Association for Computational Linguistics, Volume 9
Month:
Year:
2021
Address:
Cambridge, MA
Editors:
Brian Roark, Ani Nenkova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1546–1562
Language:
URL:
https://aclanthology.org/2021.tacl-1.92
DOI:
10.1162/tacl_a_00442
Bibkey:
Cite (ACL):
Ziheng Zeng and Suma Bhat. 2021. Idiomatic Expression Identification using Semantic Compatibility. Transactions of the Association for Computational Linguistics, 9:1546–1562.
Cite (Informal):
Idiomatic Expression Identification using Semantic Compatibility (Zeng & Bhat, TACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2021.tacl-1.92.pdf