2014
pdf
abs
JUST.ASK, a QA system that learns to answer new questions from previous interactions
Sérgio Curto
|
Ana C. Mendes
|
Pedro Curto
|
Luísa Coheur
|
Ângela Costa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present JUST.ASK, a publicly available Question Answering system, which is freely available. Its architecture is composed of the usual Question Processing, Passage Retrieval and Answer Extraction components. Several details on the information generated and manipulated by each of these components are also provided to the user when interacting with the demonstration. Since JUST.ASK also learns to answer new questions based on users feedback, (s)he is invited to identify the correct answers. These will then be used to retrieve answers to future questions.
2012
pdf
abs
An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT
Ângela Costa
|
Tiago Luís
|
Joana Ribeiro
|
Ana Cristina Mendes
|
Luísa Coheur
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The task of Statistical Machine Translation depends on large amounts of training corpora. Despite the availability of several parallel corpora, these are typically composed of declarative sentences, which may not be appropriate when the goal is to translate other types of sentences, e.g., interrogatives. There have been efforts to create corpora of questions, specially in the context of the evaluation of Question-Answering systems. One of those corpora is the UIUC dataset, composed of nearly 6,000 questions, widely used in the task of Question Classification. In this work, we make available the Portuguese version of the UIUC dataset, which we manually translated, as well as the translation guidelines. We show the impact of this corpus in the performance of a state-of-the-art SMT system when translating questions. Finally, we present a taxonomy of translation errors, according to which we analyze the output of the automatic translation before and after using the corpus as training data.
pdf
abs
Extending a wordnet framework for simplicity and scalability
Pedro Fialho
|
Sérgio Curto
|
Ana Cristina Mendes
|
Luísa Coheur
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The WordNet knowledge model is currently implemented in multiple software frameworks providing procedural access to language instances of it. Frameworks tend to be focused on structural/design aspects of the model thus describing low level interfaces for linguistic knowledge retrieval. Typically the only high level feature directly accessible is word lookup while traversal of semantic relations leads to verbose/complex combinations of data structures, pointers and indexes which are irrelevant in an NLP context. Here is described an extension to the JWNL framework that hides technical requirements of access to WordNet features with an essentially word/sense based API applying terminology from the official online interface. This high level API is applied to the original English version of WordNet and to an SQL based Portuguese lexicon, translated into a WordNet based representation usable by JWNL.
2011
pdf
Exploring linguistically-rich patterns for question generation
Sérgio Curto
|
Ana Cristina Mendes
|
Luísa Coheur
Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
2010
pdf
abs
Named Entity Recognition in Questions: Towards a Golden Collection
Ana Cristina Mendes
|
Luísa Coheur
|
Paula Vaz Lobo
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Named Entity Recognition (NER) plays a relevant role in several Natural Language Processing tasks. Question-Answering (QA) is an example of such, since answers are frequently named entities in agreement with the semantic category expected by a given question. In this context, the recognition of named entities is usually applied in free text data. NER in natural language questions can also aid QA and, thus, should not be disregarded. Nevertheless, it has not yet been given the necessary importance. In this paper, we approach the identification and classification of named entities in natural language questions. We hypothesize that NER results can benefit with the inclusion of previously labeled questions in the training corpus. We present a broad study addressing that hypothesis, focusing on the balance to be achieved between the amount of free text and questions in order to build a suitable training corpus. This work also contributes by providing a set of nearly 5,500 annotated questions with their named entities, freely available for research purposes.
2008
pdf
Reengineering a Domain-Independent Framework for Spoken Dialogue Systems
Filipe M. Martins
|
Ana Mendes
|
Mácio Freitas Viveiros
|
Joana Paulo Pardal
|
Pedro Arez
|
Nuno J. Mamede
|
João Paulo Neto
Software Engineering, Testing, and Quality Assurance for Natural Language Processing