Jian Song
2026
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
Yelin Chen | Fanjin Zhang | Suping Sun | Yunhe Pang | Yuanchun Wang | Jian Song | XiaoYan Li | Lei Hou | Shu Zhao | Jie Tang | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yelin Chen | Fanjin Zhang | Suping Sun | Yunhe Pang | Yuanchun Wang | Jian Song | XiaoYan Li | Lei Hou | Shu Zhao | Jie Tang | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Understanding research papers remains challenging for foundation models due to specialized scientific discourse and complex figures and tables, yet existing benchmarks offer limited fine-grained evaluation at scale. To address this gap, we introduce RPC-Bench, a large-scale question-answering benchmark built from review–rebuttal exchanges of high-quality computer science papers, containing 15K human-verified QA pairs. We design a fine-grained taxonomy aligned with the scientific research flow to assess models’ ability to understand and answer why, what, and how questions in scholarly contexts. We also define an elaborate LLM–human interaction annotation framework to support large-scale labeling and quality control. Following the LLM-as-a-Judge paradigm, we develop a scalable framework that evaluates models on correctness-completeness and conciseness, with high agreement to human judgment. Experiments reveal that even the strongest models (GPT-5) achieve only 68.2% correctness-completeness, dropping to 37.46% after conciseness adjustment, highlighting substantial gaps in precise academic paper understanding.
2022
DABERT: Dual Attention Enhanced BERT for Semantic Matching
Sirui Wang | Di Liang | Jian Song | Yuntao Li | Wei Wu
Proceedings of the 29th International Conference on Computational Linguistics
Sirui Wang | Di Liang | Jian Song | Yuntao Li | Wei Wu
Proceedings of the 29th International Conference on Computational Linguistics
Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word addition, deletion, and modification of sentences may cause flipped predictions. To alleviate this problem, we propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism to model affinity and difference attention. (2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs. We conduct extensive experiments on well-studied semantic matching and robustness test datasets, and the experimental results show the effectiveness of our proposed method.
Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion
Jian Song | Di Liang | Rumei Li | Yuntao Li | Sirui Wang | Minlong Peng | Wei Wu | Yongxin Yu
Findings of the Association for Computational Linguistics: EMNLP 2022
Jian Song | Di Liang | Rumei Li | Yuntao Li | Sirui Wang | Minlong Peng | Wei Wu | Yongxin Yu
Findings of the Association for Computational Linguistics: EMNLP 2022
Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the Dependency-Enhanced Adaptive Fusion Attention (DAFA), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, (i) DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. (ii) It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.