Finding Concept-specific Biases in Form–Meaning Associations

Tiago Pimentel; Brian Roark; Søren Wichmann; Ryan Cotterell; Damián Blasi

doi:10.18653/v1/2021.naacl-main.349

Finding Concept-specific Biases in Form–Meaning Associations

Tiago Pimentel, Brian Roark, Søren Wichmann, Ryan Cotterell, Damián Blasi

Abstract

This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for “tongue” is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very large concept-aligned, cross-lingual lexicon, we extend methods previously used to detect within language non-arbitrariness (Pimentel et al., 2019) to measure cross-linguistic associations. We find that there is a significant effect of non-arbitrariness, but it is unsurprisingly small (less than 0.5% on average according to our information-theoretic estimate). We also provide a concept-level analysis which shows that a quarter of the concepts considered in our work exhibit a significant level of cross-linguistic non-arbitrariness. In sum, the paper provides new methods to detect cross-linguistic associations at scale, and confirms their effects are minor.

Anthology ID:: 2021.naacl-main.349
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4416–4425
Language:
URL:: https://aclanthology.org/2021.naacl-main.349
DOI:: 10.18653/v1/2021.naacl-main.349
Bibkey:
Cite (ACL):: Tiago Pimentel, Brian Roark, Søren Wichmann, Ryan Cotterell, and Damián Blasi. 2021. Finding Concept-specific Biases in Form–Meaning Associations. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4416–4425, Online. Association for Computational Linguistics.
Cite (Informal):: Finding Concept-specific Biases in Form–Meaning Associations (Pimentel et al., NAACL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/nodalida-main-page/2021.naacl-main.349.pdf
Video:: https://preview.aclanthology.org/nodalida-main-page/2021.naacl-main.349.mp4
Code: tpimentelms/form-meaning-associations + additional community code

PDF Search Code Video