A Multi-Layered Annotated Corpus of Scientific Papers

Beatriz Fisas, Francesco Ronzano, Horacio Saggion


Abstract
Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary.To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.
Anthology ID:
L16-1492
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3081–3088
Language:
URL:
https://aclanthology.org/L16-1492
DOI:
Bibkey:
Cite (ACL):
Beatriz Fisas, Francesco Ronzano, and Horacio Saggion. 2016. A Multi-Layered Annotated Corpus of Scientific Papers. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3081–3088, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Multi-Layered Annotated Corpus of Scientific Papers (Fisas et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/L16-1492.pdf
Data
DRI Corpus