Zhixiong Zhang
2026
SudokuFill: A Multi-Agent Progressive Filling Framework for Document-Level Scientific Information Extraction
Yang Li | Yajiao Wang | Yu Zhang | Yuanzhe Zhang | Maodi Hu | Mengting Zhang | Xi Sun | Hua Yue | Zhixiong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yang Li | Yajiao Wang | Yu Zhang | Yuanzhe Zhang | Maodi Hu | Mengting Zhang | Xi Sun | Hua Yue | Zhixiong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Scientific information extraction (SciIE) is a key bottleneck for turning unstructured papers into computable knowledge bases, yet most existing systems still follow a “local extraction then global assembly” paradigm. This workflow is inherently lossy: by extracting fields in isolation, it breaks global correlations and discards high-confidence signals that could otherwise be reused as internal supervision, forcing systems to repeatedly restart from scratch, especially in long, multimodal scientific documents. In this paper, We propose a different view: SciIE should be solved as a progressive filling problem, similar to solving a Sudoku,once a field is filled with high confidence, it should act as a constraint that guides the remaining uncertain fields. Based on this idea, we introduce SudokuFill, a multi-agent framework that maintains a Global Filling State and performs priority scheduling to establish reliable anchors first, then reuses them as internal supervision for iterative deliberation over harder fields. Evaluated on a specialized document-level adjuvant dataset, our framework achieves a SOTA score of 51.83% on our benchmark. Crucially, SudokuFill enables a 7B model to outperform the vanilla GPT-4o, proving that structured architectural reasoning can effectively compensate for parameter scale.
OPINE: A Prior-calibrated Scoring Framework for LLM-based Multi-label Scientific Opinion Classification
Mengting Zhang | Gaofeng Pan | Zhixiong Zhang | Yang Li | Guangyin Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Mengting Zhang | Gaofeng Pan | Zhixiong Zhang | Yang Li | Guangyin Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Scientific opinion classification based on discourse functions provides a structured semantic basis for analytical tasks such as gap identification and hypothesis generation. However, this task is uniquely challenged by the multi-label nature of scientific expressions and AIMRaD structural constraints. Existing LLM-based methods typically rely on direct label generation, which obscures decision logic, or treat discourse information as passive context rather than a structural prior. We propose OPINE, a multi-stage framework that reformulates classification as a controllable *scoring-calibration-refinement* pipeline. By decoupling textual evidence from decision logic, OPINE generates independent label-wise affinity scores calibrated by AIMRaD priors. To resolve the multi-label challenge, we introduce a quantile-based decoding rule to naturally capture co-existing roles, alongside a pairwise refinement mechanism to mitigate confusion between similar categories. We contribute a new benchmark of 18 discourse functions across diverse sections. Experimental results show that OPINE generally outperforms strong baselines, reaching F1 scores of 63.20%, 53.68%, and 63.22% under Micro, Macro, and Example settings, respectively. Our analysis reveals that integrating discourse structures as explicit priors is superior to conventional passive context integration, while pairwise refinement successfully mitigates confusion between functionally similar categories. The code and dataset are available at https://github.com/znoodle63/OPINE.
Datasets for Scientific Literature Understanding: A Survey
Yuanzhe Zhang | Xun Zhao | Maodi Hu | Xi Sun | Donghuan Song | Zhixiong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yuanzhe Zhang | Xun Zhao | Maodi Hu | Xi Sun | Donghuan Song | Zhixiong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Empowering machines to understand scientific literature is crucial for accelerating scientific discovery and advancing the AI for Science (AI4S) paradigm. In this paper, we present a comprehensive survey of datasets serving this domain. We propose a systematic taxonomy that organizes resources spanning structural understanding, text understanding, multimodal understanding and pre-training/instruction fine-tuning. Beyond a structured overview, we discuss the evolution of the field, elucidating how the emergence of Large Language Models (LLMs) has reshaped research priorities of dataset construction. By synthesizing existing datasets and identifying critical future directions, this work provides a roadmap for advancing intelligent scientific research systems.
2024
SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model
Dayong Wu | Jiaqi Li | Baoxin Wang | Honghong Zhao | Siyuan Xue | Yanjie Yang | Zhijun Chang | Rui Zhang | Li Qian | Bo Wang | Shijin Wang | Zhixiong Zhang | Guoping Hu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Dayong Wu | Jiaqi Li | Baoxin Wang | Honghong Zhao | Siyuan Xue | Yanjie Yang | Zhijun Chang | Rui Zhang | Li Qian | Bo Wang | Shijin Wang | Zhixiong Zhang | Guoping Hu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large language models (LLMs) have shown remarkable achievements across various language tasks. To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million.
2020
Representing and Reconstructing PhySH: Which Embedding Competent?
Xiaoli Chen | Zhixiong Zhang
Proceedings of the 8th International Workshop on Mining Scientific Publications
Xiaoli Chen | Zhixiong Zhang
Proceedings of the 8th International Workshop on Mining Scientific Publications
Recent advances in natural language processing make embedding representations dominate the computing language world. Though it is taken for granted, we actually have limited knowledge of how these embeddings perform in representing the complex hierarchy of domain scientific knowledge. In this paper, we conduct a comprehensive comparison of well-known embeddings’ capability in capturing the hierarchical Physics knowledge. Several key findings are: i, Poincaré embeddings do outperform if trained on PhySH taxonomy, but it fails if trained on co-occurrence pairs which are extracted from raw text. ii, No algorithm can properly learn hierarchies from the more realistic case of co-occurrence pairs, which contains more noisy relations other than hierarchical relations. iii, Our statistic analysis of Poincaré embedding’s representation of PhySH shows successful hierarchical representation share two characteristics: firstly, upper-level terms have a smaller semantic distance to root; secondly, upper-level hypernym-hyponym pairs should be further apart than lower-level hypernym-hyponym pairs.