Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory

Stéphan Tulkens; Dominiek Sandra; Walter Daelemans

Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory

Stéphan Tulkens, Dominiek Sandra, Walter Daelemans

Abstract

We consider the orthographic neighborhood effect: the effect that words with more orthographic similarity to other words are read faster. The neighborhood effect serves as an important control variable in psycholinguistic studies of word reading, and explains variance in addition to word length and word frequency. Following previous work, we model the neighborhood effect as the average distance to neighbors in feature space for three feature sets: slots, character ngrams and skipgrams. We optimize each of these feature sets and find evidence for language-independent optima, across five megastudy corpora from five alphabetic languages. Additionally, we show that weighting features using the inverse of mutual information (MI) improves the neighborhood effect significantly for all languages. We analyze the inverse feature weighting, and show that, across languages, grammatical morphemes get the lowest weights. Finally, we perform the same experiments on Korean Hangul, a non-alphabetic writing system, where we find the opposite results: slower responses as a function of denser neighborhoods, and a negative effect of inverse feature weighting. This raises the question of whether this is a cognitive effect, or an effect of the way we represent Hangul orthography, and indicates more research is needed.

Anthology ID:: 2020.lrec-1.22
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 172–181
Language:: English
URL:: https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.22/
DOI:
Bibkey:
Cite (ACL):: Stéphan Tulkens, Dominiek Sandra, and Walter Daelemans. 2020. Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 172–181, Marseille, France. European Language Resources Association.
Cite (Informal):: Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory (Tulkens et al., LREC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.22.pdf

PDF Cite Search Fix data