Giovanni Cassani


2024

pdf
BigNLI: Native Language Identification with Big Bird Embeddings
Sergey Kramp | Giovanni Cassani | Chris Emmery
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Native Language Identification (NLI) intends to classify an author’s native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and NLI transformer models have thus far failed to offer effective, practical alternatives. The current work shows input size is a limiting factor, and that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models (for which we reproduce previous work) by a large margin on the Reddit-L2 dataset. Additionally, we provide further insight into input length dependencies, show consistent out-of-sample (Europe subreddit) and out-of-domain (TOEFL-11) performance, and qualitatively analyze the embedding space. Given the effectiveness and computational efficiency of this method, we believe it offers a promising avenue for future NLI work.

2015

pdf
Towards a Model of Prediction-based Syntactic Category Acquisition: First Steps with Word Embeddings
Robert Grimm | Giovanni Cassani | Walter Daelemans | Steven Gillis
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

pdf
Which distributional cues help the most? Unsupervised contexts selection for lexical category acquisition
Giovanni Cassani | Robert Grimm | Walter Daelemans | Steven Gillis
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning