2025
pdf
bib
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Tirthankar Ghosal
|
Philipp Mayr
|
Amanpreet Singh
|
Aakanksha Naik
|
Georg Rehm
|
Dayne Freitag
|
Dan Li
|
Sonja Schimmler
|
Anita De Waard
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
pdf
bib
abs
Overview of the Fifth Workshop on Scholarly Document Processing
Tirthankar Ghosal
|
Philipp Mayr
|
Anita De Waard
|
Aakanksha Naik
|
Amanpreet Singh
|
Dayne Freitag
|
Georg Rehm
|
Sonja Schimmler
|
Dan Li
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
The workshop on Scholarly Document Processing (SDP) started in 2020 to accelerate research, inform policy, and educate the public on natural language processing for scientific text. The fifth iteration of the workshop, SDP 2025 was held at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) in Vienna as a hybrid event. The workshop saw a great increase in interest, with 26 submissions, of which 11 were accepted for the research track. The program consisted of a research track, invited talks and four shared tasks: (1) SciHal25: Hallucination Detection for Scientific Content, (2) SciVQA: Scientific Visual Question Answering, (3) ClimateCheck: Scientific Factchecking of Social Media Posts on Climate Change, and (4) Software Mention Detection in Scholarly Publications (SOMD 25). In addition to the four shared task overview papers, 18 shared task reports were accepted. The program was geared towards NLP, information extraction, information retrieval, and data mining for scholarly documents, with an emphasis on identifying and providing solutions to open challenges.
pdf
bib
abs
Overview of the SciHal25 Shared Task on Hallucination Detection for Scientific Content
Dan Li
|
Bogdan Palfi
|
Colin Zhang
|
Jaiganesh Subramanian
|
Adrian Raudaschl
|
Yoshiko Kakita
|
Anita De Waard
|
Zubair Afzal
|
Georgios Tsatsaronis
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
This paper provides an overview of the Hallucination Detection for Scientific Content (SciHal) shared task held in the 2025 ACL Scholarly Document Processing workshop. The task invites participants to detect hallucinated claims in answers to research-oriented questions generated by real-world GenAI-powered research assistants. This task is formulated as a multi-label classification problem, each instance consists of a question, an answer, an extracted claim, and supporting reference abstracts. Participants are asked to label claims under two subtasks: (1) coarse-grained detection with labels Entailment, Contradiction, or Unverifiable; and (2) fine-grained detection with a more detailed taxonomy including 8 types.The dataset consists of 500 research-oriented questions collected over one week from a generative assistant tool. These questions were rewritten using GPT-4o and manually reviewed to address potential privacy or commercial concerns. In total, 10,000 reference abstracts were retrieved, and 4,592 claims were extracted from the assistant’s answers. Each claim is annotated with hallucination labels. The dataset is divided into 3,592 training, 500 validation, and 500 test instances.Subtask 1 saw 88 submissions across 10 teams while subtask 2 saw 39 submissions across 6 teams, resulting in a total of 5 published technical reports. This paper summarizes the task design, dataset, participation, and key findings.
2024
pdf
bib
abs
Scalable Patent Classification with Aggregated Multi-View Ranking
Dan Li
|
Vikrant Yadav
|
Zi Long Zhu
|
Maziar Moradi Fard
|
Zubair Afzal
|
George Tsatsaronis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Automated patent classification typically involves assigning labels to a patent from a taxonomy, using multi-class multi-label classification models. However, classification-based models face challenges in scaling to large numbers of labels, struggle with generalizing to new labels, and fail to effectively utilize the rich information and multiple views of patents and labels. In this work, we propose a multi-view ranking-based method to address these limitations. Our method consists of four ranking-based models that incorporate different views of patents and a meta-model that aggregates and re-ranks the candidate labels given by the four ranking models. We compared our approach against the state-of-the-art baselines on two publicly available patent classification datasets, USPTO-2M and CLEF-IP-2011. We demonstrate that our approach can alleviate the aforementioned limitations and achieve a new state-of-the-art performance by a significant margin.
2023
pdf
bib
abs
Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation
Dan Li
|
Zi Long Zhu
|
Janneke van de Loo
|
Agnes Masip Gomez
|
Vikrant Yadav
|
Georgios Tsatsaronis
|
Zubair Afzal
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Extreme multi-label text classification is a prevalent task in industry, but it frequently encounters challenges in terms of machine learning perspectives, including model limitations, data scarcity, and time-consuming evaluation. This paper aims to mitigate these issues by introducing novel approaches. Firstly, we propose a label ranking model as an alternative to the conventional SciBERT-based classification model, enabling efficient handling of large-scale labels and accommodating new labels. Secondly, we present an active learning-based pipeline that addresses the data scarcity of new labels during the update of a classification system. Finally, we introduce ChatGPT to assist with model evaluation. Our experiments demonstrate the effectiveness of these techniques in enhancing the extreme multi-label text classification task.
2022
pdf
bib
abs
VIRT: Improving Representation-based Text Matching via Virtual Interaction
Dan Li
|
Yang Yang
|
Hongyin Tang
|
Jiahao Liu
|
Qifan Wang
|
Jingang Wang
|
Tong Xu
|
Wei Wu
|
Enhong Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Text matching is a fundamental research problem in natural language understanding. Interaction-based approaches treat the text pair as a single sequence and encode it through cross encoders, while representation-based models encode the text pair independently with siamese or dual encoders. Interaction-based models require dense computations and thus are impractical in real-world applications. Representation-based models have become the mainstream paradigm for efficient text matching. However, these models suffer from severe performance degradation due to the lack of interactions between the pair of texts. To remedy this, we propose a Virtual InteRacTion mechanism (VIRT) for improving representation-based text matching while maintaining its efficiency. In particular, we introduce an interactive knowledge distillation module that is only applied during training. It enables deep interaction between texts by effectively transferring knowledge from the interaction-based model. A light interaction strategy is designed to fully leverage the learned interactive knowledge. Experimental results on six text matching benchmarks demonstrate the superior performance of our method over several state-of-the-art representation-based models. We further show that VIRT can be integrated into existing methods as plugins to lift their performances.
pdf
bib
abs
Unsupervised Dense Retrieval for Scientific Articles
Dan Li
|
Vikrant Yadav
|
Zubair Afzal
|
George Tsatsaronis
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
In this work, we build a dense retrieval based semantic search engine on scientific articles from Elsevier. The major challenge is that there is no labeled data for training and testing. We apply a state-of-the-art unsupervised dense retrieval model called Generative Pseudo Labeling that generates high-quality pseudo training labels. Furthermore, since the articles are unbalanced across different domains, we select passages from multiple domains to form balanced training data. For the evaluation, we create two test sets: one manually annotated and one automatically created from the meta-information of our data. We compare the semantic search engine with the currently deployed lexical search engine on the two test sets. The results of the experiment show that the semantic search engine trained with pseudo training labels can significantly improve search performance.