That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

Gabriele Sarti; Dominique Brunato; Felice Dell’Orletta

doi:10.18653/v1/2021.cmcl-1.5

That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

Gabriele Sarti, Dominique Brunato, Felice Dell’Orletta

Abstract

This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM’s linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.

Anthology ID:: 2021.cmcl-1.5
Volume:: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:: June
Year:: 2021
Address:: Online
Editors:: Emmanuele Chersoni, Nora Hollenstein, Cassandra Jacobs, Yohei Oseki, Laurent Prévot, Enrico Santus
Venue:: CMCL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 48–60
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2021.cmcl-1.5/
DOI:: 10.18653/v1/2021.cmcl-1.5
Bibkey:
Cite (ACL):: Gabriele Sarti, Dominique Brunato, and Felice Dell’Orletta. 2021. That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 48–60, Online. Association for Computational Linguistics.
Cite (Informal):: That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models (Sarti et al., CMCL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2021.cmcl-1.5.pdf
Code: gsarti/interpreting-complexity
Data: Universal Dependencies

PDF Cite Search Code Fix data