Xinyu Gao
2026
Gated Tree Cross-Attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMs
Xinyu Gao | Shaonan Wang | Nai Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinyu Gao | Shaonan Wang | Nai Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained competence. We introduce a checkpoint-compatible gated tree cross-attention (GTCA) branch that reads precomputed constituency chunk memory while leaving backbone architecture unchanged. Our design uses a token update mask and staged training to control the scope and timing of structural updates. Across benchmarks and transformer backbones, GTCA strengthens syntactic robustness beyond continued-training baselines without compromising Multiple-Choice QA performance or commonsense reasoning, providing a practical checkpoint-compatible route to more syntax-robust decoder-only LLMs.
2025
MASP: A Multilingual Dataset for Probing Scalar Modifier Understanding in LLMs
Xinyu Gao | Nai Ding | Wei Liu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Xinyu Gao | Nai Ding | Wei Liu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"This study aims to test how large language models (LLMs) understand gradable adjectives and whether their understanding compares with humans, under the framework of formal semantics.We introduce a diagnostic dataset, referred to as the Modifier-Adjective Scale Probe (MASP),to evaluate how well LLMs understand a gradable adjective (e.g., long) when the adjective is combined with one modifier (e.g., very long or slightly long, a condition referred to as degree modification) or is further negated (e.g., very not long and not very long, a condition referred to as compositional negation). The dataset consists of over 80,000 natural language inference questions in both Chinese and English. We apply the MASP dataset to test both humans and11 popular LLMs, including GPT-4o and Gemini-2.0-Flash. The results show that most LLMscan correctly understand whether a modifier boosts (e.g., very) an adjective. However, they fail to understand the modifiers that weaken the degree and the negation forms of modifiers.Furthermore, we parameterize the human and LLM behavior, and find that the judgment patterns of LLMs differ from humans especially in the Chinese tests. These findings suggest that LLM sare still not well aligned with humans in terms of the interpretation of simple adjective phrases,and MASP provides a new approach to quantify the interpretation of adjective phrases in LLMs."
2022
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Ping Yang | Junjie Wang | Ruyi Gan | Xinyu Zhu | Lin Zhang | Ziwei Wu | Xinyu Gao | Jiaxing Zhang | Tetsuya Sakai
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Ping Yang | Junjie Wang | Ruyi Gan | Xinyu Zhu | Lin Zhang | Ziwei Wu | Xinyu Gao | Jiaxing Zhang | Tetsuya Sakai
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
We propose a new paradigm for zero-shot learners that is format agnostic, i.e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis. Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training. Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN. It not only adds generalization ability to models but also significantly reduces the number of parameters. Our method shares the merits of efficient training and deployment. Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification. Our model achieves this success with only 235M parameters, which is substantially smaller than state-of-the-art models with billions of parameters. The code and pre-trained models are available at https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/unimc .