2024
pdf
bib
abs
Overview of the Fourth Workshop on Scholarly Document Processing
Tirthankar Ghosal
|
Amanpreet Singh
|
Anita De Waard
|
Philipp Mayr
|
Aakanksha Naik
|
Orion Weller
|
Yoonjoo Lee
|
Zejiang Shen
|
Yanxia Qin
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
The workshop on Scholarly Document Processing (SDP) started in 2020 to accelerate research, inform policy and educate the public on natural language processing for scientific text. The fourth iteration of the workshop, SDP24 was held at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL24) as a hybrid event. The SDP workshop saw a great increase in interest, with 57 submissions, of which 28 were accepted. The program consisted of a research track, four invited talks and two shared tasks: 1) DAGPap24: Detecting automatically generated scientific papers and 2) Context24: Multimodal Evidence and Grounding Context Identification for Scientific Claims. The program was geared towards NLP, information extraction, information retrieval, and data mining for scholarly documents, with an emphasis on identifying and providing solutions to open challenges.
pdf
bib
abs
Overview of the DagPap24 Shared Task on Detecting Automatically Generated Scientific Paper
Savvas Chamezopoulos
|
Drahomira Herrmannova
|
Anita De Waard
|
Drahomira Herrmannova
|
Domenic Rosati
|
Yury Kashnitsky
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
This paper provides an overview of the 2024 ACL Scholarly Document Processing workshop shared task on the detection of automatically generated scientific papers. Unlike our previous task, which focused on the binary classification of whether scientific passages were machine-generated or not, one likely use case for text generation technology in scientific writing is to intersperse human-written text with passages of machine-generated text. We frame the detection problem as a multiclass span classification task: given an expert of text, label token spans in the text as human-written or machine-generated We shared a dataset containing excerpts from human-written papers as well as artificially generated content collected by Elsevier publishing and editorial teams. As a test set, the participants were provided with a corpus of openly accessible human-written as well as generated papers from the same scientific domains of documents. The shared task saw 457 submissions across 28 participating teams and resulted in three published technical reports. We discuss our findings from the shared task in this overview paper.
2022
pdf
bib
Proceedings of the Third Workshop on Scholarly Document Processing
Arman Cohan
|
Guy Feigenblat
|
Dayne Freitag
|
Tirthankar Ghosal
|
Drahomira Herrmannova
|
Petr Knoth
|
Kyle Lo
|
Philipp Mayr
|
Michal Shmueli-Scheuer
|
Anita de Waard
|
Lucy Lu Wang
Proceedings of the Third Workshop on Scholarly Document Processing
pdf
bib
abs
Overview of the Third Workshop on Scholarly Document Processing
Arman Cohan
|
Guy Feigenblat
|
Dayne Freitag
|
Tirthankar Ghosal
|
Drahomira Herrmannova
|
Petr Knoth
|
Kyle Lo
|
Philipp Mayr
|
Michal Shmueli-Scheuer
|
Anita de Waard
|
Lucy Lu Wang
Proceedings of the Third Workshop on Scholarly Document Processing
With the ever-increasing pace of research and high volume of scholarly communication, scholars face a daunting task. Not only must they keep up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseudo-science and disinformation. These needs have motivated an increasing focus on computational methods for enhancing search, summarization, and analysis of scholarly documents. However, the various strands of research on scholarly document processing remain fragmented. To reach out to the broader NLP and AI/ML community, pool distributed efforts in this area, and enable shared access to published research, we held the 3rd Workshop on Scholarly Document Processing (SDP) at COLING as a hybrid event (
https://sdproc.org/2022/). The SDP workshop consisted of a research track, three invited talks and five Shared Tasks: 1) MSLR22: Multi-Document Summarization for Literature Reviews, 2) DAGPap22: Detecting automatically generated scientific papers, 3) SV-Ident 2022: Survey Variable Identification in Social Science Publications, 4) SKGG: Scholarly Knowledge Graph Generation, 5) MuP 2022: Multi Perspective Scientific Document Summarization. The program was geared towards NLP, information retrieval, and data mining for scholarly documents, with an emphasis on identifying and providing solutions to open challenges.
pdf
abs
Overview of the DAGPap22 Shared Task on Detecting Automatically Generated Scientific Papers
Yury Kashnitsky
|
Drahomira Herrmannova
|
Anita de Waard
|
George Tsatsaronis
|
Catriona Catriona Fennell
|
Cyril Labbe
Proceedings of the Third Workshop on Scholarly Document Processing
This paper provides an overview of the DAGPap22 shared task on the detection of automatically generated scientific papers at the Scholarly Document Process workshop colocated with COLING. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. We shared a dataset containing excerpts from human-written papers as well as artificially generated content and suspicious documents collected by Elsevier publishing and editorial teams. As a test set, the participants are provided with a 5x larger corpus of openly accessible human-written as well as generated papers from the same scientific domains of documents. The shared task saw 180 submissions across 14 participating teams and resulted in two published technical reports. We discuss our findings from the shared task in this overview paper.
2021
pdf
bib
Proceedings of the Second Workshop on Scholarly Document Processing
Iz Beltagy
|
Arman Cohan
|
Guy Feigenblat
|
Dayne Freitag
|
Tirthankar Ghosal
|
Keith Hall
|
Drahomira Herrmannova
|
Petr Knoth
|
Kyle Lo
|
Philipp Mayr
|
Robert M. Patton
|
Michal Shmueli-Scheuer
|
Anita de Waard
|
Kuansan Wang
|
Lucy Lu Wang
Proceedings of the Second Workshop on Scholarly Document Processing
pdf
abs
Argument Mining for Scholarly Document Processing: Taking Stock and Looking Ahead
Khalid Al Khatib
|
Tirthankar Ghosal
|
Yufang Hou
|
Anita de Waard
|
Dayne Freitag
Proceedings of the Second Workshop on Scholarly Document Processing
Argument mining targets structures in natural language related to interpretation and persuasion which are central to scientific communication. Most scholarly discourse involves interpreting experimental evidence and attempting to persuade other scientists to adopt the same conclusions. While various argument mining studies have addressed student essays and news articles, those that target scientific discourse are still scarce. This paper surveys existing work in argument mining of scholarly discourse, and provides an overview of current models, data, tasks, and applications. We identify a number of key challenges confronting argument mining in the scientific domain, and suggest some possible solutions and future directions.
pdf
abs
Overview of the Second Workshop on Scholarly Document Processing
Iz Beltagy
|
Arman Cohan
|
Guy Feigenblat
|
Dayne Freitag
|
Tirthankar Ghosal
|
Keith Hall
|
Drahomira Herrmannova
|
Petr Knoth
|
Kyle Lo
|
Philipp Mayr
|
Robert Patton
|
Michal Shmueli-Scheuer
|
Anita de Waard
|
Kuansan Wang
|
Lucy Lu Wang
Proceedings of the Second Workshop on Scholarly Document Processing
With the ever-increasing pace of research and high volume of scholarly communication, scholars face a daunting task. Not only must they keep up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseudo-science and disinformation. These needs have motivated an increasing focus on computational methods for enhancing search, summarization, and analysis of scholarly documents. However, the various strands of research on scholarly document processing remain fragmented. To reach out to the broader NLP and AI/ML community, pool distributed efforts in this area, and enable shared access to published research, we held the 2nd Workshop on Scholarly Document Processing (SDP) at NAACL 2021 as a virtual event (
https://sdproc.org/2021/). The SDP workshop consisted of a research track, three invited talks, and three Shared Tasks (LongSumm 2021, SCIVER, and 3C). The program was geared towards the application of NLP, information retrieval, and data mining for scholarly documents, with an emphasis on identifying and providing solutions to open challenges.
2020
pdf
bib
Proceedings of the First Workshop on Scholarly Document Processing
Muthu Kumar Chandrasekaran
|
Anita de Waard
|
Guy Feigenblat
|
Dayne Freitag
|
Tirthankar Ghosal
|
Eduard Hovy
|
Petr Knoth
|
David Konopnicki
|
Philipp Mayr
|
Robert M. Patton
|
Michal Shmueli-Scheuer
Proceedings of the First Workshop on Scholarly Document Processing
pdf
bib
abs
Overview of the First Workshop on Scholarly Document Processing (SDP)
Muthu Kumar Chandrasekaran
|
Guy Feigenblat
|
Dayne Freitag
|
Tirthankar Ghosal
|
Eduard Hovy
|
Philipp Mayr
|
Michal Shmueli-Scheuer
|
Anita de Waard
Proceedings of the First Workshop on Scholarly Document Processing
Next to keeping up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseudo-science and disinformation. To address these challenges, computational work on enhancing search, summarization, and analysis of scholarly documents has flourished. However, the various strands of research on scholarly document processing remain fragmented. To reach to the broader NLP and AI/ML community, pool distributed efforts and enable shared access to published research, we held the 1st Workshop on Scholarly Document Processing at EMNLP 2020 as a virtual event. The SDP workshop consisted of a research track (including a poster session), two invited talks and three Shared Tasks (CL-SciSumm, Lay-Summ and LongSumm), geared towards easier access to scientific methods and results.
Website:
https://ornlcda.github.io/SDProcpdf
abs
Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm
Muthu Kumar Chandrasekaran
|
Guy Feigenblat
|
Eduard Hovy
|
Abhilasha Ravichander
|
Michal Shmueli-Scheuer
|
Anita de Waard
Proceedings of the First Workshop on Scholarly Document Processing
We present the results of three Shared Tasks held at the Scholarly Document Processing Workshop at EMNLP2020: CL-SciSumm, LaySumm and LongSumm. We report on each of the tasks, which received 18 submissions in total, with some submissions addressing two or three of the tasks. In summary, the quality and quantity of the submissions show that there is ample interest in scholarly document summarization, and the state of the art in this domain is at a midway point between being an impossible task and one that is fully resolved.
2012
pdf
bib
Identifying Claimed Knowledge Updates in Biomedical Research Articles
Ágnes Sándor
|
Anita de Waard
Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
pdf
A three-way perspective on scientific discourse annotation for knowledge extraction
Maria Liakata
|
Paul Thompson
|
Anita de Waard
|
Raheel Nawaz
|
Henk Pander Maat
|
Sophia Ananiadou
Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
pdf
Epistemic Modality and Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features
Anita de Waard
|
Henk Pander Maat
Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
2009
pdf
Identifying the Epistemic Value of Discourse Segments in Biology Texts (project abstract)
Anita de Waard
|
Paul Buitelaar
|
Thomas Eigner
Proceedings of the Eight International Conference on Computational Semantics