Setting Up Bilingual Comparable Corpora with Non-Contemporary Languages

Helena Bermudez Sabel, Francesca Dell’Oro, Cyrielle Montrichard, Corinne Rossari


Abstract
This paper presents the project “Les corpora latins et français: une fabrique pour l’accès à la représentation des connaissances” (Latin and French Corpora: a Factory For Accessing Knowledge Representation) whose focus is the study of modality in both Latin and French by means of multi-genre, diachronic comparable corpora. The setting up of such corpora involves a number of conceptualisation challenges, in particular with regard to how to compare two asynchronous textual productions corresponding to different cultural frameworks. In this paper we outline the rationale of designing comparable corpora to explore our research questions and then focus on some of the issues that arise when comparing different diachronic spans of Latin and French. We also explain how these issues were dealt with, thus providing some grounds upon which other projects could build their methodology.
Anthology ID:
2022.bucc-1.8
Volume:
Proceedings of the BUCC Workshop within LREC 2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
Venue:
BUCC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
56–60
Language:
URL:
https://aclanthology.org/2022.bucc-1.8
DOI:
Bibkey:
Cite (ACL):
Helena Bermudez Sabel, Francesca Dell’Oro, Cyrielle Montrichard, and Corinne Rossari. 2022. Setting Up Bilingual Comparable Corpora with Non-Contemporary Languages. In Proceedings of the BUCC Workshop within LREC 2022, pages 56–60, Marseille, France. European Language Resources Association.
Cite (Informal):
Setting Up Bilingual Comparable Corpora with Non-Contemporary Languages (Bermudez Sabel et al., BUCC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.bucc-1.8.pdf