The Hebrew Essay Corpus

Chen Gafni, Anat Prior, Shuly Wintner


Abstract
We present the Hebrew Essay Corpus: an annotated corpus of Hebrew language argumentative essays authored by prospective higher-education students. The corpus includes both essays by native speakers, written as part of the psychometric exam that is used to assess their future success in academic studies; and essays authored by non-native speakers, with three different native languages, that were written as part of a language aptitude test. The corpus is uniformly encoded and stored. The non-native essays were annotated with target hypotheses whose main goal is to make the texts amenable to automatic processing (morphological and syntactic analysis). The corpus is available for academic purposes upon request. We describe the corpus and the error correction and annotation schemes used in its analysis. In addition to introducing this new resource, we discuss the challenges of identifying and analyzing non-native language use in general, and propose various ways for dealing with these challenges.
Anthology ID:
2022.lrec-1.598
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5580–5586
Language:
URL:
https://aclanthology.org/2022.lrec-1.598
DOI:
Bibkey:
Cite (ACL):
Chen Gafni, Anat Prior, and Shuly Wintner. 2022. The Hebrew Essay Corpus. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5580–5586, Marseille, France. European Language Resources Association.
Cite (Informal):
The Hebrew Essay Corpus (Gafni et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2022.lrec-1.598.pdf