A Joint Approach to Compound Splitting and Idiomatic Compound Detection

Irina Krotova, Sergey Aksenov, Ekaterina Artemova


Abstract
Applications such as machine translation, speech recognition, and information retrieval require efficient handling of noun compounds as they are one of the possible sources for out of vocabulary words. In-depth processing of noun compounds requires not only splitting them into smaller components (or even roots) but also the identification of instances that should remain unsplitted as they are of idiomatic nature. We develop a two-fold deep learning-based approach of noun compound splitting and idiomatic compound detection for the German language that we train using a newly collected corpus of annotated German compounds. Our neural noun compound splitter operates on a sub-word level and outperforms the current state of the art by about 5%
Anthology ID:
2020.lrec-1.543
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4410–4417
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.543
DOI:
Bibkey:
Cite (ACL):
Irina Krotova, Sergey Aksenov, and Ekaterina Artemova. 2020. A Joint Approach to Compound Splitting and Idiomatic Compound Detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4410–4417, Marseille, France. European Language Resources Association.
Cite (Informal):
A Joint Approach to Compound Splitting and Idiomatic Compound Detection (Krotova et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.543.pdf