CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern
Maria Andreevna Petrova, Alexandra M. Ivoylova, Anastasia Tishchenkova
Abstract
The paper is devoted to the annotation format aimed at morphological, syntactic and especially semantic markup. The format combines the Enhanced UD morphosyntax and the Compreno semantic pattern, enriching the UD annotation with word meanings and labels for semantic relations between words. To adapt the Compreno semantics for the current purpose, we reduced the number of the semantic fields denoting lexical meanings by using hyperonym fields. Moreover, we used a generalized variant of the semantic relations as the original roles possess rather narrow meanings which makes them too numerous. Creating such a format demands the Compreno-to-UD morphosyntax conversion as well, which, in turn, demands solving the asymmetry problem between the models. The asymmetry concerns tokenization, lemmatization, POS-tagging, sets of grammatical features and dependency heads. To overcome this problem, the Compreno-to-UD converter was created. As an application, the work presents a 150,000 token corpus of English news annotated according to the standard.- Anthology ID:
- 2024.lrec-main.304
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 3422–3432
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.304
- DOI:
- Cite (ACL):
- Maria Andreevna Petrova, Alexandra M. Ivoylova, and Anastasia Tishchenkova. 2024. CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3422–3432, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern (Petrova et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.304.pdf