So Young Lee


2025

pdf bib
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee | Russell Scheinberg | Amber Shore | Ameeta Agrawal
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

This study explores how recent large language models (LLMs) navigate relative clause attachment ambiguity and use world knowledge biases for disambiguation in six typologically diverse languages: English, Chinese, Japanese, Korean, Russian, and Spanish. We describe the process of creating a novel dataset – MultiWho – for fine-grained evaluation of relative clause attachment preferences in ambiguous and unambiguous contexts. Our experiments with three LLMs indicate that, contrary to humans, LLMs consistently exhibit a preference for local attachment, displaying limited responsiveness to syntactic variations or language-specific attachment patterns.Although LLMs performed well in unambiguous cases, they rigidly prioritized world knowledge biases, lacking the flexibility of human language processing. These findings highlight the need for more diverse, pragmatically nuanced multilingual training to improve LLMs’ handling of complex structures and human-like comprehension.

2024

pdf bib
The effects of distance on NPI illusive effects in BERT
So Young Lee | Mai Ha Vu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Previous studies have examined the syntactic capabilities of large pre-trained language models, such as BERT, by using stimuli from psycholinguistic studies. Studying well-known processing errors, such as NPI illusive effects can reveal whether a model prioritizes linear or hierarchical information when processing language. Recent experiments have found that BERT is mildly susceptible to Negative Polarity Item (NPI) illusion effects (Shin et al., 2023; Vu and Lee, 2022). We expand on these results by examining the effect of distance on the illusive effect, using and modifying stimuli from Parker and Phillips (2016). We also further tease apart whether the model is more affected by hierarchical distance or linear distance. We find that BERT is highly sensitive to syntactic hierarchical information: added hierarchical layers affected its processing capabilities compared to added linear distance.

pdf bib
Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models
So Young Lee | Russell Scheinberg | Amber Shore | Ameeta Agrawal
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation

2022

pdf bib
Evaluating Structural Economy Claims in Relative Clause Attachment
Aniello De Santo | So Young Lee
Proceedings of the Society for Computation in Linguistics 2022