Hugo Zaragoza

2016

pdf abs
The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
Mijail Kabadjov | Udo Kruschwitz | Massimo Poesio | Josef Steinberger | Jorge Valderrama | Hugo Zaragoza
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present the OnForumS corpus developed for the shared task of the same name on Online Forum Summarisation (OnForumS at MultiLing’15). The corpus consists of a set of news articles with associated readers’ comments from The Guardian (English) and La Repubblica (Italian). It comes with four levels of annotation: argument structure, comment-article linking, sentiment and coreference. The former three were produced through crowdsourcing, whereas the latter, by an experienced annotator using a mature annotation scheme. Given its annotation breadth, we believe the corpus will prove a useful resource in stimulating and furthering research in the areas of Argumentation Mining, Summarisation, Sentiment, Coreference and the interlinks therein.

2011

pdf
Learning to Rank Answers to Non-Factoid Questions from Web Collections
Mihai Surdeanu | Massimiliano Ciaramita | Hugo Zaragoza
Computational Linguistics, Volume 37, Issue 2 - June 2011

2010

pdf abs
Active Learning for Building a Corpus of Questions for Parsing
Jordi Atserias | Giuseppe Attardi | Maria Simi | Hugo Zaragoza
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes how we built a dependency Treebank for questions. The questions for the Treebank were drawn from questions from the TREC 10 QA task and from Yahoo! Answers. Among the uses for the corpus is to train a dependency parser achieving good accuracy on parsing questions without hurting its overall accuracy. We also explore active learning techniques to determine the suitable size for a corpus of questions in order to achieve adequate accuracy while minimizing the annotation efforts.

2009

pdf
Company-Oriented Extractive Summarization of Financial News
Katja Filippova | Mihai Surdeanu | Massimiliano Ciaramita | Hugo Zaragoza
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf
Learning to Rank Answers on Large Online QA Collections
Mihai Surdeanu | Massimiliano Ciaramita | Hugo Zaragoza
Proceedings of ACL-08: HLT

pdf abs
Semantically Annotated Snapshot of the English Wikipedia
Jordi Atserias | Hugo Zaragoza | Massimiliano Ciaramita | Giuseppe Attardi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes SW1, the first version of a semantically annotated snapshot of the English Wikipedia. In recent years Wikipedia has become a valuable resource for both the Natural Language Processing (NLP) community and the Information Retrieval (IR) community. Although NLP technology for processing Wikipedia already exists, not all researchers and developers have the computational resources to process such a volume of information. Moreover, the use of different versions of Wikipedia processed differently might make it difficult to compare results. The aim of this work is to provide easy access to syntactic and semantic annotations for researchers of both NLP and IR communities by building a reference corpus to homogenize experiments and make results comparable. These resources, a semantically annotated corpus and a entity containment derived graph, are licensed under the GNU Free Documentation License and available from http://www.yr-bcn.es/semanticWikipedia