Nai Ding

Also published as: 鼐丁

2025

pdf bib abs
Information Integration in Large Language Models is Gated by Linguistic Structural Markers
Wei Liu | Nai Ding
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Language comprehension relies on integrating information across both local words and broader context. We propose a method to quantify the information integration window of large language models (LLMs) and examine how sentence and clause boundaries constrain this window. Specifically, LLMs are required to predict a target word based on either a local window (local prediction) or the full context (global prediction), and we use Jensen-Shannon (JS) divergence to measure the information loss from relying solely on the local window, termed the local-prediction deficit. Results show that integration windows of both humans and LLMs are strongly modulated by sentence boundaries, and predictions primarily rely on words within the same sentence or clause: The local-prediction deficit follows a power-law decay as the window length increases and drops sharply at the sentence boundary. This boundary effect is primarily attributed to linguistic structural markers, e.g., punctuation, rather than implicit syntactic or semantic cues. Together, these results indicate that LLMs rely on explicit structural cues to guide their information integration strategy.

2024

pdf bib abs
Sentence-Space Metrics (SSM) for the Evaluation of Sentence Comprehension
Jieyu Lin | Honghua Chen | Nai Ding
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“It is a fundamental challenge to evaluate whether a model can truly capture the meaning ofsentences. Evaluation of whether a model well captures the meaning of individual words, how-ever, can be effectively achieved by analyzing whether the model encodes words in a vectorspace where semantically similar words form clusters. Inspired by this approach, we propose theSentence-Space Metrics (SSM) to evaluate model interpretation of sentences, and the sentencespace is constructed based on the pairwise entailment relationships between all sentence pairswithin a sentence pool. We use three metrics to evaluate a sentence space, i.e., (1) sparsity, (2)clustering of related sentences, and (3) similarity with the sentence space measured from hu-mans. The SSM is applied to evaluate 20 models, including ChatGPT, 18 BERT-family modelsfine-tuned for Natural Language Inference (NLI) task, as well as SimCSE, a sentence representa-tion model. The SSM reveals dramatic differences among models: Although all models achievehigh accuracy on standard NLI datasets such as MNLI, none of them mirrors the human behaviorunder the SSM. These results demonstrate that, compared with traditional accuracy measures,the SSM considers pairwise relationships between hundreds of sentences and therefore providea more fine-grained evaluation of model interpretation of sentences.Introduction”

2023

pdf bib abs
Probing the “Creativity” of Large Language Models: Can models produce divergent semantic association?
Honghua Chen | Nai Ding
Findings of the Association for Computational Linguistics: EMNLP 2023

Large language models possess remarkable capacity for processing language, but it remains unclear whether these models can further generate creative content. The present study aims to investigate the creative thinking of large language models through a cognitive perspective. We utilize the divergent association task (DAT), an objective measurement of creativity that asks models to generate unrelated words and calculates the semantic distance between them. We compare the results across different models and decoding strategies. Our findings indicate that: (1) When using the greedy search strategy, GPT-4 outperforms 96% of humans, while GPT-3.5-turbo exceeds the average human level. (2) Stochastic sampling and temperature scaling are effective to obtain higher DAT scores for models except GPT-4, but face a trade-off between creativity and stability. These results imply that advanced large language models have divergent semantic associations, which is a fundamental process underlying creativity.

2022

Natural language inference (NLI) is a task to infer the relationship between a premise and a hypothesis (e.g., entailment, neutral, or contradiction), and transformer-based models perform well on current NLI datasets such as MNLI and SNLI. Nevertheless, given the linguistic complexity of the large-scale datasets, it remains controversial whether these models can truly infer the relationship between sentences or they simply guess the answer via shallow heuristics. Here, we introduce a controlled evaluation set called Simple Pair to test the basic sentence inference ability of NLI models using sentences with syntactically simple structures. Three popular transformer-based models, i.e., BERT, RoBERTa, and DeBERTa, are employed. We find that these models fine-tuned on MNLI or SNLI perform very poorly on Simple Pair (< 35.4% accuracy). Further analyses reveal event coreference and compositional binding problems in these models. To improve the model performance, we augment the training set, i.e., MNLI or SNLI, with a few examples constructed based on Simple Pair ( 1% of the size of the original SNLI/MNLI training sets). Models fine-tuned on the augmented training set maintain high performance on MNLI/SNLI and perform very well on Simple Pair (~100% accuracy). Furthermore, the positive performance of the augmented training models can transfer to more complex examples constructed based on sentences from MNLI and SNLI. Taken together, the current work shows that (1) models achieving high accuracy on mainstream large-scale datasets still lack the capacity to draw accurate inferences on simple sentences, and (2) augmenting mainstream datasets with a small number of target simple sentences can effectively improve model performance.

2021

pdf bib abs
Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models
Jieyu Lin | Jiajie Zou | Nai Ding
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Pre-trained language models have achieved human-level performance on many Machine Reading Comprehension (MRC) tasks, but it remains unclear whether these models truly understand language or answer questions by exploiting statistical biases in datasets. Here, we demonstrate a simple yet effective method to attack MRC models and reveal the statistical biases in these models. We apply the method to the RACE dataset, for which the answer to each MRC question is selected from 4 options. It is found that several pre-trained language models, including BERT, ALBERT, and RoBERTa, show consistent preference to some options, even when these options are irrelevant to the question. When interfered by these irrelevant options, the performance of MRC models can be reduced from human-level performance to the chance-level performance. Human readers, however, are not clearly affected by these irrelevant options. Finally, we propose an augmented training method that can greatly reduce models’ statistical biases.

pdf bib abs
基于篇章结构攻击的阅读理解任务探究(Analysis of Reading Comprehension Tasks based on passage structure attacks)
Shukai Ma (马树楷) | Jiajie Zou (邹家杰) | Nai Ding (丁鼐)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

本文实验发现,段落顺序会影响人类阅读理解效果;而打乱段落或句子顺序,对BERT、ALBERT和RoBERTa三种人工神经网络模型的阅读理解答题几乎没有影响。打乱词序后,人的阅读理解水平低于三个模型,但人和模型的答题情况高于随机水平,这说明人比人工神经网络对词序更敏感,但人与模型可以在单词乱序的情况下答题。综上,人与人工神经网络在正常阅读的情况下回答阅读理解问题的正确率相当,但两者对篇章结构及语序的依赖程度不同。

Co-authors

Shukai Ma 1

Ming Xiang 1

Venues

Fix data