All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German
Yana Strakatova, Neele Falk, Isabel Fuhrmann, Erhard Hinrichs, Daniela Rossmann
Abstract
In this paper we present the GerCo dataset of adjective-noun collocations for German, such as alter Freund ‘old friend’ and tiefe Liebe ‘deep love’. The annotation has been performed by experts based on the annotation scheme introduced in this paper. The resulting dataset contains 4,732 positive and negative instances of collocations and covers all the 16 semantic classes of adjectives as defined in the German wordnet GermaNet. The dataset can serve as a reliable empirical basis for comparing different theoretical frameworks concerned with collocations or as material for data-driven approaches to the studies of collocations including different machine learning experiments. This paper addresses the latter issue by using the GerCo dataset for evaluating different models on the task of automatic collocation identification. We compare lexical association measures with static and contextualized word embeddings. The experiments show that word embeddings outperform methods based on statistical association measures by a wide margin.- Anthology ID:
- 2020.lrec-1.538
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4368–4378
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.538
- DOI:
- Cite (ACL):
- Yana Strakatova, Neele Falk, Isabel Fuhrmann, Erhard Hinrichs, and Daniela Rossmann. 2020. All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4368–4378, Marseille, France. European Language Resources Association.
- Cite (Informal):
- All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German (Strakatova et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.538.pdf