Carmen Dayrell


2016

pdf
Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
Scott Piao | Paul Rayson | Dawn Archer | Francesca Bianchi | Carmen Dayrell | Mahmoud El-Haj | Ricardo-María Jiménez | Dawn Knight | Michal Křen | Laura Löfberg | Rao Muhammad Adeel Nawab | Jawad Shafi | Phoey Lee Teh | Olga Mudraya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resources to cover more languages, such as EuroWordNet and Global WordNet. In this paper, we report on the construction of large-scale multilingual semantic lexicons for twelve languages, which employ the unified Lancaster semantic taxonomy and provide a multilingual lexical knowledge base for the automatic UCREL semantic annotation system (USAS). Our work contributes towards the goal of constructing larger-scale and higher-quality multilingual semantic lexical resources and developing corpus annotation tools based on them. Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them. Our evaluation shows that some semantic lexicons such as those for Finnish and Italian have achieved lexical coverage of over 90% while others need further expansion.

2015

pdf
Development of the Multilingual Semantic Annotation System
Scott Piao | Francesca Bianchi | Carmen Dayrell | Angela D’Egidio | Paul Rayson
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2013

pdf
Approaches for Helping Brazilian Students Improve their Scientific Writings
Ethel Schuster | Rick Lizotte | Sandra M. Aluísio | Carmen Dayrell
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

2012

pdf
Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora
Carmen Dayrell | Arnaldo Candido Jr. | Gabriel Lima | Danilo Machado Jr. | Ann Copestake | Valéria Feltrim | Stella Tagnin | Sandra Aluisio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. This study focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of current machine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approach does not adequately reflect actual language use since a move can be realized by a clause, a sentence, or even several sentences. Here, we present MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifies rhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate. We have resorted to various other NLP tools and used two large training corpora: (i) one corpus consists of 645 abstracts from physical sciences and engineering (PE) and (ii) the other corpus is made up of 690 from life and health sciences (LH). This paper presents our preliminary results and also discusses the various challenges involved in multi-label tagging and works towards satisfactory solutions. In addition, we also make our two training corpora publicly available so that they may serve as benchmark for this new task.