Predicting the presence of inline citations in academic text using binary classification
Peter Vajdecka, Elena Callegari, Desara Xhura, Atli Ásmundsson
Abstract
Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.- Anthology ID:
- 2023.nodalida-1.72
- Volume:
- Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May
- Year:
- 2023
- Address:
- Tórshavn, Faroe Islands
- Editors:
- Tanel Alumäe, Mark Fishel
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- University of Tartu Library
- Note:
- Pages:
- 717–722
- Language:
- URL:
- https://aclanthology.org/2023.nodalida-1.72
- DOI:
- Cite (ACL):
- Peter Vajdecka, Elena Callegari, Desara Xhura, and Atli Ásmundsson. 2023. Predicting the presence of inline citations in academic text using binary classification. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 717–722, Tórshavn, Faroe Islands. University of Tartu Library.
- Cite (Informal):
- Predicting the presence of inline citations in academic text using binary classification (Vajdecka et al., NoDaLiDa 2023)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2023.nodalida-1.72.pdf