Gender, Speech, and Representation in the Galician Parliament: An Analysis Based on the ParlaMint-ES-GA Dataset
Adina I. Vladu
Elisa Fernández Rei
Carmen Magariños
Noelia García Díaz
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
This paper employs the ParlaMint-ES-GA dataset to scrutinize the intersection of gender, speech, and representation within the Parliament of Galicia, an autonomous region located in North-western Spain. The research questions center around the dynamics of women’s participation in parliamentary proceedings. Contrary to numerical parity, we explore whether increased female presence in the parliament correlates with equitable access to the floor. Analyzing parliamentary proceedings from 2015 to 2022, our quantitative study investigates the relationship between the legislative body’s composition, the number of speeches by Members of Parliament (MPs), and references made by MPs in their speeches. The findings reveal nuances in gender representation and participation, challenging assumptions about proportional access to parliamentary discourse.
Nós-TTS: aWeb User Interface for Galician Text-to-Speech
Carmen Magariños
Alp Öktem
Antonio Moscoso Sánchez
Marta Vázquez Abuín
Noelia García Díaz
Adina Ioana Vladu
Elisa Fernández Rei
María Baqueiro Vidal
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2
The Nós Project: Opening routes for the Galician language in the field of language technologies
Iria de-Dios-Flores
Carmen Magariños
Adina Ioana Vladu
John E. Ortega
José Ramom Pichel
Marcos García
Pablo Gamallo
Elisa Fernández Rei
Alberto Bugarín-Diz
Manuel González González
Senén Barro
Xosé Luis Regueira
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
The development of language technologies (LTs) such as machine translation, text analytics, and dialogue systems is essential in the current digital society, culture and economy. These LTs, widely supported in languages in high demand worldwide, such as English, are also necessary for smaller and less economically powerful languages, as they are a driving force in the democratization of the communities that use them due to their great social and cultural impact. As an example, dialogue systems allow us to communicate with machines in our own language; machine translation increases access to contents in different languages, thus facilitating intercultural relations; and text-to-speech and speech-to-text systems broaden different categories of users’ access to technology. In the case of Galician (co-official language, together with Spanish, in the autonomous region of Galicia, located in northwestern Spain), incorporating the language into state-of-the-art AI applications can not only significantly favor its prestige (a decisive factor in language normalization), but also guarantee citizens’ language rights, reduce social inequality, and narrow the digital divide. This is the main motivation behind the Nós Project (Proxecto Nós), which aims to have a significant contribution to the development of LTs in Galician (currently considered a low-resource language) by providing openly licensed resources, tools, and demonstrators in the area of intelligent technologies.
Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
Marta Martínez
Rocío Varela
Carmen García Mateo
Elisa Fernández Rei
Adela Martínez Calvo
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
SEA_AP (Segmentador e Etiquetador Automático para Análise Prosódica, Automatic Segmentation and Labelling for Prosodic Analysis) toolkit is an application that performs audio segmentation and labelling to create a TextGrid file which will be used to launch a prosodic analysis using Praat. In this paper, we want to describe the improved functionality of the tool achieved by adding a dialectometric analysis module using R scripts. The dialectometric analysis includes computing correlations among F0 curves and it obtains prosodic distances among the different variables of interest (location, speaker, structure, etc.). The dialectometric analysis requires large databases in order to be adequately computed, and automatic segmentation and labelling can create them thanks to a procedure less costly than the manual alternative. Thus, the integration of these tools into the SEA_AP allows to propose a distribution of geoprosodic areas by means of a quantitative method, which completes the traditional dialectological point of view. The current version of the SEA_AP toolkit is capable of analysing Galician, Spanish and Brazilian Portuguese data, and hence the distances between several prosodic linguistic varieties can be measured at present.
Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
Roberto Seara
Marta Martinez
Rocío Varela
Carmen García Mateo
Elisa Fernandez Rei
Xosé Luis Regueira
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The “Corpus Oral Informatizado da Lingua Galega (CORILGA)” project aims at building a corpus of oral language for Galician, primarily designed to study the linguistic variation and change. This project is currently under development and it is periodically enriched with new contributions. The long-term goal is that all the speech recordings will be enriched with phonetic, syllabic, morphosyntactic, lexical and sentence ELAN-complaint annotations. A way to speed up the process of annotation is to use automatic speech-recognition-based tools tailored to the application. Therefore, CORILGA repository has been enhanced with an automatic alignment tool, available to the administrator of the repository, that aligns speech with an orthographic transcription. In the event that no transcription, or just a partial one, were available, a speech recognizer for Galician is used to generate word and phonetic segmentations. These recognized outputs may contain errors that will have to be manually corrected by the administrator. For assisting this task, the tool also provides an ELAN tier with the confidence measure of each recognized word. In this paper, after the description of the main facts of the CORILGA corpus, the speech alignment and recognition tools are described. Both have been developed using the Kaldi toolkit.
CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis
Carmen García-Mateo
Antonio Cardenal
Xosé Luis Regueira
Elisa Fernández Rei
Marta Martinez
Roberto Seara
Rocío Varela
Noemí Basanta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the CORILGA (Corpus Oral Informatizado da Lingua Galega). CORILGA is a large high-quality corpus of spoken Galician from the 1960s up to present-day, including both formal and informal spoken language from both standard and non-standard varieties, and across different generations and social levels. The corpus will be available to the research community upon completion. Galician is one of the EU languages that needs further research before highly effective language technology solutions can be implemented. A software repository for speech resources in Galician is also described. The repository includes a structured database, a graphical interface and processing tools. The use of a database enables to perform search in a simple and fast way based in a number of different criteria. The web-based user interface facilitates users the access to the different materials. Last but not least a set of transcription-based modules for automatic speech recognition has been developed, thus facilitating the orthographic labelling of the recordings.