CorpusNÓS: A massive Galician corpus for training large language models

Iria de-Dios-Flores, Silvia Paniagua Suárez, Cristina Carbajal Pérez, Daniel Bardanca Outeiriño, Marcos Garcia, Pablo Gamallo


Anthology ID:
2024.propor-1.66
Volume:
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Month:
March
Year:
2024
Address:
Santiago de Compostela, Galicia/Spain
Editors:
Pablo Gamallo, Daniela Claro, António Teixeira, Livy Real, Marcos Garcia, Hugo Gonçalo Oliveira, Raquel Amaro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Lingustics
Note:
Pages:
593–599
Language:
URL:
https://aclanthology.org/2024.propor-1.66
DOI:
Bibkey:
Cite (ACL):
Iria de-Dios-Flores, Silvia Paniagua Suárez, Cristina Carbajal Pérez, Daniel Bardanca Outeiriño, Marcos Garcia, and Pablo Gamallo. 2024. CorpusNÓS: A massive Galician corpus for training large language models. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 593–599, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Cite (Informal):
CorpusNÓS: A massive Galician corpus for training large language models (de-Dios-Flores et al., PROPOR 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.propor-1.66.pdf