Interannotator Agreement for Lexico-Semantic Annotation of a Corpus

Elżbieta Hajnicz


Abstract
This paper examines the procedure for lexico-semantic annotation of the Basic Corpus of Polish Metaphors that is the first step for annotating metaphoric expressions occurring in it. The procedure involves correcting the morphosyntactic annotation of part of the corpus that is automatically annotated on the morphosyntactic level. The main procedure concerns annotation of adjectives, adverbs, nouns and verbs (including gerunds and participles), including abbreviations of the words that belong to the above classes. It is composed of three steps: deciding whether a particular occurrence of a word is asemantic (e.g. anaphoric or strictly grammatical), whether we are dealing with a multi-word expression, reciprocal usages of the się marker and pluralia tantum, which may involve annotation with two lexical units (having two different lemmas) for a single token. We propose an interannotator agreement statistics adequate for this procedure. Finally, we discuss the preliminary results of annotation of a fragment of the corpus.
Anthology ID:
2020.lrec-1.227
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1842–1848
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.227
DOI:
Bibkey:
Cite (ACL):
Elżbieta Hajnicz. 2020. Interannotator Agreement for Lexico-Semantic Annotation of a Corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1842–1848, Marseille, France. European Language Resources Association.
Cite (Informal):
Interannotator Agreement for Lexico-Semantic Annotation of a Corpus (Hajnicz, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.227.pdf