Gerardo Sierra

Also published as: Gerardo Sierra-Martínez


pdf bib
Sociolinguistic Corpus of WhatsApp Chats in Spanish among College Students
Alejandro Dorantes | Gerardo Sierra | Tlauhlia Yamín Donohue Pérez | Gemma Bel-Enguix | Mónica Jasso Rosales
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media

This work presents the Sociolinguistic Corpus of WhatsApp Chats in Spanish among College Students, a corpus of raw data for general use. Its purpose is to offer data for the study of of language and interactions via Instant Messaging (IM) among bachelors. Our paper consists of an overview of both the corpus’s content and demographic metadata. Furthermore, it presents the current research being conducted with it —namely parenthetical expressions, orality traits, and code-switching. This work also includes a brief outline of similar corpora and recent studies in the field of IM.

Challenges of language technologies for the indigenous languages of the Americas
Manuel Mager | Ximena Gutierrez-Vasques | Gerardo Sierra | Ivan Meza-Ruiz
Proceedings of the 27th International Conference on Computational Linguistics

Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant languages and low-resource scenarios are faced. We would like to encourage NLP research in linguistically rich and diverse areas like the Americas.


Applying the Rhetorical Structure Theory in Alzheimer patients’ speech
Anayeli Paulino | Gerardo Sierra
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms


Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl
Ximena Gutierrez-Vasques | Gerardo Sierra | Isaac Hernandez Pompa
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the project called Axolotl which comprises a Spanish-Nahuatl parallel corpus and its search interface. Spanish and Nahuatl are distant languages spoken in the same country. Due to the scarcity of digital resources, we describe the several problems that arose when compiling this corpus: most of our sources were non-digital books, we faced errors when digitizing the sources and there were difficulties in the sentence alignment process, just to mention some. The documents of the parallel corpus are not homogeneous, they were extracted from different sources, there is dialectal, diachronical, and orthographical variation. Additionally, we present a web search interface that allows to make queries through the whole parallel corpus, the system is capable to retrieve the parallel fragments that contain a word or phrase searched by a user in any of the languages. To our knowledge, this is the first Spanish-Nahuatl public available digital parallel corpus. We think that this resource can be useful to develop language technologies and linguistic studies for this language pair.

pdf bib
Detection of Alzheimer’s disease based on automatic analysis of common objects descriptions
Laura Hernández-Domínguez | Edgar García-Cano | Sylvie Ratté | Gerardo Sierra-Martínez
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning


Using Wikipedia to Validate the Terminology found in a Corpus of Basic Textbooks
Jorge Vivaldi | Luis Adrián Cabrera-Diego | Gerardo Sierra | María Pozzi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A scientific vocabulary is a set of terms that designate scientific concepts. This set of lexical units can be used in several applications ranging from the development of terminological dictionaries and machine translation systems to the development of lexical databases and beyond. Even though automatic term recognition systems exist since the 80s, this process is still mainly done by hand, since it generally yields more accurate results, although not in less time and at a higher cost. Some of the reasons for this are the fairly low precision and recall results obtained, the domain dependence of existing tools and the lack of available semantic knowledge needed to validate these results. In this paper we present a method that uses Wikipedia as a semantic knowledge resource, to validate term candidates from a set of scientific text books used in the last three years of high school for mathematics, health education and ecology. The proposed method may be applied to any domain or language (assuming there is a minimal coverage by Wikipedia).


pdf bib
On the Development of the RST Spanish Treebank
Iria da Cunha | Juan-Manuel Torres-Moreno | Gerardo Sierra
Proceedings of the 5th Linguistic Annotation Workshop

The RST Spanish Treebank On-line Interface
Iria da Cunha | Juan-Manuel Torres-Moreno | Gerardo Sierra | Luis-Adrián Cabrera-Diego | Brenda-Gabriela Castro-Rolón | Juan-Miguel Rolland Bartilotti
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011


Recognition and extraction of definitional contexts in Spanish for sketching a lexical network
César Aguilar | Olga Acosta | Gerardo Sierra
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas


pdf bib
Proceedings of the 1st Workshop on Definition Extraction
Gerardo Sierra | Mara Pozzi | Juan-Manuel Torres
Proceedings of the 1st Workshop on Definition Extraction

pdf bib
A Formal Scope on the Relations Between Definitions and Verbal Predications
César Aguilar | Gerardo Sierra
Proceedings of the 1st Workshop on Definition Extraction

pdf bib
Description and Evaluation of a Pattern Based Approach for Definition Extraction
Rodrigo Alarcón | Gerardo Sierra | Carme Bach
Proceedings of the 1st Workshop on Definition Extraction


Natural Language Searching in Onomasiological Dictionaries
Gerardo Sierra
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)


Extracting semantic clusters from the alignment of definitions
Gerardo Sierra | John McNaught
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

Extraction of Semantic Clusters for Terminological Information Retrieval from MRDs
Gerardo Sierra | John McNaught
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)