Petr Knoth


2021

pdf bib
Proceedings of the Second Workshop on Scholarly Document Processing
Iz Beltagy | Arman Cohan | Guy Feigenblat | Dayne Freitag | Tirthankar Ghosal | Keith Hall | Drahomira Herrmannova | Petr Knoth | Kyle Lo | Philipp Mayr | Robert M. Patton | Michal Shmueli-Scheuer | Anita de Waard | Kuansan Wang | Lucy Lu Wang
Proceedings of the Second Workshop on Scholarly Document Processing

pdf bib
Overview of the 2021 SDP 3C Citation Context Classification Shared Task
Suchetha N. Kunnath | David Pride | Drahomira Herrmannova | Petr Knoth
Proceedings of the Second Workshop on Scholarly Document Processing

This paper provides an overview of the 2021 3C Citation Context Classification shared task. The second edition of the shared task was organised as part of the 2nd Workshop on Scholarly Document Processing (SDP 2021). The task is composed of two subtasks: classifying citations based on their (Subtask A) purpose and (Subtask B) influence. As in the previous year, both tasks were hosted on Kaggle and used a portion of the new ACT dataset. A total of 22 teams participated in Subtask A, and 19 teams competed in Subtask B. All the participated systems were ranked based on their achieved macro f-score. The highest scores of 0.26973 and 0.60025 were reported for subtask A and B, respectively.

pdf bib
Overview of the Second Workshop on Scholarly Document Processing
Iz Beltagy | Arman Cohan | Guy Feigenblat | Dayne Freitag | Tirthankar Ghosal | Keith Hall | Drahomira Herrmannova | Petr Knoth | Kyle Lo | Philipp Mayr | Robert Patton | Michal Shmueli-Scheuer | Anita de Waard | Kuansan Wang | Lucy Wang
Proceedings of the Second Workshop on Scholarly Document Processing

With the ever-increasing pace of research and high volume of scholarly communication, scholars face a daunting task. Not only must they keep up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseudo-science and disinformation. These needs have motivated an increasing focus on computational methods for enhancing search, summarization, and analysis of scholarly documents. However, the various strands of research on scholarly document processing remain fragmented. To reach out to the broader NLP and AI/ML community, pool distributed efforts in this area, and enable shared access to published research, we held the 2nd Workshop on Scholarly Document Processing (SDP) at NAACL 2021 as a virtual event (https://sdproc.org/2021/). The SDP workshop consisted of a research track, three invited talks, and three Shared Tasks (LongSumm 2021, SCIVER, and 3C). The program was geared towards the application of NLP, information retrieval, and data mining for scholarly documents, with an emphasis on identifying and providing solutions to open challenges.

2020

pdf bib
Proceedings of the 8th International Workshop on Mining Scientific Publications
Petr Knoth | Christopher Stahl | Bikash Gyawali | David Pride | Suchetha N. Kunnath | Drahomira Herrmannova
Proceedings of the 8th International Workshop on Mining Scientific Publications

pdf bib
Overview of the 2020 WOSP 3C Citation Context Classification Task
Suchetha Nambanoor Kunnath | David Pride | Bikash Gyawali | Petr Knoth
Proceedings of the 8th International Workshop on Mining Scientific Publications

The 3C Citation Context Classification task is the first shared task addressing citation context classification. The two subtasks, A and B, associated with this shared task, involves the classification of citations based on their purpose and influence, respectively. Both tasks use a portion of the new ACT dataset, developed by the researchers at The Open University, UK. The tasks were hosted on Kaggle, and the participated systems were evaluated using the macro f-score. Three teams participated in subtask A and four teams participated in subtask B. The best performing systems obtained an overall score of 0.2056 for subtask A and 0.5556 for subtask B, outperforming the simple majority class baseline models, which scored 0.11489 and 0.32249, respectively. In this paper we provide a report specifying the shared task, the dataset used, a short description of the participating systems and the final results obtained by the teams based on the evaluation criteria. The shared task has been organised as part of the 8th International Workshop on Mining Scientific Publications (WOSP 2020) workshop.

pdf bib
Proceedings of the First Workshop on Scholarly Document Processing
Muthu Kumar Chandrasekaran | Anita de Waard | Guy Feigenblat | Dayne Freitag | Tirthankar Ghosal | Eduard Hovy | Petr Knoth | David Konopnicki | Philipp Mayr | Robert M. Patton | Michal Shmueli-Scheuer
Proceedings of the First Workshop on Scholarly Document Processing

pdf bib
Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings
Bikash Gyawali | Lucas Anastasiou | Petr Knoth
Proceedings of the 12th Language Resources and Evaluation Conference

Deduplication is the task of identifying near and exact duplicate data items in a collection. In this paper, we present a novel method for deduplication of scholarly documents. We develop a hybrid model which uses structural similarity (locality sensitive hashing) and meaning representation (word embeddings) of document texts to determine (near) duplicates. Our collection constitutes a subset of multidisciplinary scholarly documents aggregated from research repositories. We identify several issues causing data inaccuracies in such collections and motivate the need for deduplication. In lack of existing dataset suitable for study of deduplication of scholarly documents, we create a ground truth dataset of 100K scholarly documents and conduct a series of experiments to empirically establish optimal values for the parameters of our deduplication method. Experimental evaluation shows that our method achieves a macro F1-score of 0.90. We productionise our method as a publicly accessible web API service serving deduplication of scholarly documents in real time.

2018

pdf bib
Analyzing Citation-Distance Networks for Evaluating Publication Impact
Drahomira Herrmannova | Petr Knoth | Robert Patton
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2011

pdf bib
Using Explicit Semantic Analysis for Cross-Lingual Link Discovery
Petr Knoth | Lukas Zilka | Zdenek Zdrahal
Proceedings of the Fifth International Workshop On Cross Lingual Information Access

2010

pdf bib
Automatic generation of inter-passage links based on semantic similarity
Petr Knoth | Jakub Novotny | Zdenek Zdrahal
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)