2021
pdf
bib
An Experiment on Implicitly Crowdsourcing Expert Knowledge about Romanian Synonyms from Language Learners
Lionel Nicolas
|
Lavinia Nicoleta Aparaschivei
|
Verena Lyding
|
Christos Rodosthenous
|
Federico Sangati
|
Alexander König
|
Corina Forascu
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
2020
pdf
abs
Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning
Lionel Nicolas
|
Verena Lyding
|
Claudia Borg
|
Corina Forascu
|
Karën Fort
|
Katerina Zdravkova
|
Iztok Kosem
|
Jaka Čibej
|
Špela Arhar Holdt
|
Alice Millour
|
Alexander König
|
Christos Rodosthenous
|
Federico Sangati
|
Umair ul Hassan
|
Anisia Katinskaia
|
Anabela Barreiro
|
Lavinia Aparaschivei
|
Yaakov HaCohen-Kerner
Proceedings of the Twelfth Language Resources and Evaluation Conference
We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of this generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.
2016
pdf
bib
Proceedings of the 8th Global WordNet Conference (GWC)
Christiane Fellbaum
|
Piek Vossen
|
Verginica Barbu Mititelu
|
Corina Forascu
Proceedings of the 8th Global WordNet Conference (GWC)
2013
pdf
bib
Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian
Lei Li
|
Corina Forascu
|
Mahmoud El-Haj
|
George Giannakopoulos
Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization
2012
pdf
abs
Evaluating Machine Reading Systems through Comprehension Tests
Anselmo Peñas
|
Eduard Hovy
|
Pamela Forner
|
Álvaro Rodrigo
|
Richard Sutcliffe
|
Corina Forascu
|
Caroline Sporleder
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. The task was to answer a series of multiple choice tests, each based on a single document. This allows complex questions to be asked but makes evaluation simple and completely automatic. The evaluation architecture is completely multilingual: test documents, questions, and their answers are identical in all the supported languages. Background text collections are comparable collections harvested from the web for a set of predefined topics. Each test received an evaluation score between 0 and 1 using c@1. This measure encourages systems to reduce the number of incorrect answers while maintaining the number of correct ones by leaving some questions unanswered. 12 groups participated in the task, submitting 62 runs in 3 different languages (German, English, and Romanian). All runs were monolingual; no team attempted a cross-language task. We report here the conclusions and lessons learned after the first campaign in 2011.
pdf
abs
Romanian TimeBank: An Annotated Parallel Corpus for Temporal Information
Corina Forăscu
|
Dan Tufiş
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The paper describes the main steps for the construction, annotation and validation of the Romanian version of the TimeBank corpus. Starting from the English TimeBank corpus ― the reference annotated corpus in the temporal domain, we have translated all the 183 English news texts into Romanian and mapped the English annotations onto Romanian, with a success rate of 96.53%. Based on ISO-Time - the emerging standard for representing temporal information, which includes many of the previous annotations schemes -, we have evaluated the automatic transfer onto Romanian and, and, when necessary, corrected the Romanian annotations so that in the end we obtained a 99.18% transfer rate for the TimeML annotations. In very few cases, due to language peculiarities, some original annotations could not be transferred. For the portability of the temporal annotation standard to Romanian, we suggested some additions for the ISO-Time standard, concerning especially the EVENT tag, based on linguistic evidence, the Romanian grammar, and also on the localisations of TimeML to other Romance languages. Future improvements to the Ro-TimeBank will take into consideration all temporal expressions, signals and events in texts, even those with a not very clear temporal anchoring.
2010
pdf
abs
GikiCLEF: Crosscultural Issues in Multilingual Information Access
Diana Santos
|
Luís Miguel Cabral
|
Corina Forascu
|
Pamela Forner
|
Fredric Gey
|
Katrin Lamm
|
Thomas Mandl
|
Petya Osenova
|
Anselmo Peñas
|
Álvaro Rodrigo
|
Julia Schulz
|
Yvonne Skalban
|
Erik Tjong Kim Sang
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper we describe GikiCLEF, the first evaluation contest that, to our knowledge, was specifically designed to expose and investigate cultural and linguistic issues involved in structured multimedia collections and searching, and which was organized under the scope of CLEF 2009. GikiCLEF evaluated systems that answered hard questions for both human and machine, in ten different Wikipedia collections, namely Bulgarian, Dutch, English, German, Italian, Norwegian (Bokmäl and Nynorsk), Portuguese, Romanian, and Spanish. After a short historical introduction, we present the task, together with its motivation, and discuss how the topics were chosen. Then we provide another description from the point of view of the participants. Before disclosing their results, we introduce the SIGA management system explaining the several tasks which were carried out behind the scenes. We quantify in turn the GIRA resource, offered to the community for training and further evaluating systems with the help of the 50 topics gathered and the solutions identified. We end the paper with a critical discussion of what was learned, advancing possible ways to reuse the data.
2009
pdf
bib
Proceedings of the Workshop on Events in Emerging Text Types
Constantin Orasan
|
Laura Hasler
|
Corina Forăscu
Proceedings of the Workshop on Events in Emerging Text Types
2008
pdf
abs
GMT to +2 or how can TimeML be used in Romanian
Corina Forăscu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The paper describes the construction and usage of the Romanian version of the TimeBank corpus. The success rate of 96.53% for the automatic import of the temporal annotation from English to Romanian shows that the automatic transfer is a worth doing enterprise if temporality is to be studied in another language than the one for which TimeML, the annotation standard used, was developed. A preliminary study identifies the main situations that occurred during the automatic transfer, as well as temporal elements not (yet) marked in the English corpus.
pdf
abs
How to Evaluate and Raise the Quality in a Collaborative Lexicographic Approach
Dan Cristea
|
Corina Forăscu
|
Marius Răschip
|
Michael Zock
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper focuses on different aspects of collaborative work used to create the electronic version of a dictionary in paper format, edited and printed by the Romanian Academy during the last century. In order to ensure accuracy in a reasonable amount of time, collaborative proofreading of the scanned material, through an on-line interface has been initiated. The paper details the activities and the heuristics used to maximize accuracy, and to evaluate the work of anonymous contributors with diverse backgrounds. Observing the behaviour of the enterprise for a period of 6 months allows estimating the feasibility of the approach till the end of the project.
2006
pdf
abs
Temporality in relation with discourse structure
Corina Forăscu
|
Ionuț Cristian Pistol
|
Dan Cristea
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Temporal relations between events and times are often difficult to discover, time-consuming and expensive. In this paper a corpus study is performed to derive a strong relation between discourse structure, as revealed by Veins theory, and the temporal links between entities, as addressed in the TimeML annotation standard. The data interpretation helps us gain insight on how Veins theory can improve the manual and even (semi-) automatic detection of temporal relations.