Desara Xhura


2023

pdf
Predicting the presence of inline citations in academic text using binary classification
Peter Vajdecka | Elena Callegari | Desara Xhura | Atli Ásmundsson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.

2022

pdf bib
A corpus for Automatic Article Analysis
Elena Callegari | Desara Xhura
Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

We describe the structure and creation of the SageWrite corpus. This is a manually annotated corpus created to support automatic language generation and automatic quality assessment of academic articles. The corpus currently contains annotations for 100 excerpts taken from various scientific articles. For each of these excerpts, the corpus contains (i) a draft version of the excerpt (ii) annotations that reflect the stylistic and linguistics merits of the excerpt, such as whether or not the text is clearly structured. The SageWrite corpus is the first corpus for the fine-tuning of text-generation algorithms that specifically addresses academic writing.