CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern

Maria Andreevna Petrova, Alexandra M. Ivoylova, Anastasia Tishchenkova


Abstract
The paper is devoted to the annotation format aimed at morphological, syntactic and especially semantic markup. The format combines the Enhanced UD morphosyntax and the Compreno semantic pattern, enriching the UD annotation with word meanings and labels for semantic relations between words. To adapt the Compreno semantics for the current purpose, we reduced the number of the semantic fields denoting lexical meanings by using hyperonym fields. Moreover, we used a generalized variant of the semantic relations as the original roles possess rather narrow meanings which makes them too numerous. Creating such a format demands the Compreno-to-UD morphosyntax conversion as well, which, in turn, demands solving the asymmetry problem between the models. The asymmetry concerns tokenization, lemmatization, POS-tagging, sets of grammatical features and dependency heads. To overcome this problem, the Compreno-to-UD converter was created. As an application, the work presents a 150,000 token corpus of English news annotated according to the standard.
Anthology ID:
2024.lrec-main.304
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
3422–3432
Language:
URL:
https://aclanthology.org/2024.lrec-main.304
DOI:
Bibkey:
Cite (ACL):
Maria Andreevna Petrova, Alexandra M. Ivoylova, and Anastasia Tishchenkova. 2024. CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3422–3432, Torino, Italia. ELRA and ICCL.
Cite (Informal):
CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern (Petrova et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.304.pdf