Predictability of Distributional Semantics in Derivational Word Formation

Sebastian Padó, Aurélie Herbelot, Max Kisselew, Jan Šnajder


Abstract
Compositional distributional semantic models (CDSMs) have successfully been applied to the task of predicting the meaning of a range of linguistic constructions. Their performance on semi-compositional word formation process of (morphological) derivation, however, has been extremely variable, with no large-scale empirical investigation to date. This paper fills that gap, performing an analysis of CDSM predictions on a large dataset (over 30,000 German derivationally related word pairs). We use linear regression models to analyze CDSM performance and obtain insights into the linguistic factors that influence how predictable the distributional context of a derived word is going to be. We identify various such factors, notably part of speech, argument structure, and semantic regularity.
Anthology ID:
C16-1122
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1285–1296
Language:
URL:
https://aclanthology.org/C16-1122
DOI:
Bibkey:
Cite (ACL):
Sebastian Padó, Aurélie Herbelot, Max Kisselew, and Jan Šnajder. 2016. Predictability of Distributional Semantics in Derivational Word Formation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1285–1296, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Predictability of Distributional Semantics in Derivational Word Formation (Padó et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/C16-1122.pdf
Data
CELEX