Annotation of lexical bundles with discourse functions in a Spanish academic corpus

Eleonora Guzzi, Margarita Alonso-Ramos, Marcos Garcia, Marcos García Salido


Abstract
This paper describes the process of annotation of 996 lexical bundles (LB) assigned to 39 different discourse functions in a Spanish academic corpus. The purpose of the annotation is to obtain a new Spanish gold-standard corpus of 1,800,000 words useful for training and evaluating computational models that are capable of identifying automatically LBs for each context in new corpora, as well as for linguistic analysis about the role of LBs in academic discourse. The annotation process revealed that correspondence between LBs and discourse functions is not biunivocal and that the degree of ambiguity is high, so linguists’ contribution has been essential for improving the automatic assignation of tags.
Anthology ID:
2023.mwe-1.14
Volume:
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–105
Language:
URL:
https://aclanthology.org/2023.mwe-1.14
DOI:
10.18653/v1/2023.mwe-1.14
Bibkey:
Cite (ACL):
Eleonora Guzzi, Margarita Alonso-Ramos, Marcos Garcia, and Marcos García Salido. 2023. Annotation of lexical bundles with discourse functions in a Spanish academic corpus. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 99–105, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Annotation of lexical bundles with discourse functions in a Spanish academic corpus (Guzzi et al., MWE 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.mwe-1.14.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2023.mwe-1.14.mp4