Construction of Text Summarization Corpus for the Credibility of Information on the Web

Masahiro Nakano, Hideyuki Shibuki, Rintaro Miyazaki, Madoka Ishioroshi, Koichi Kaneko, Tatsunori Mori


Abstract
Recently, the credibility of information on the Web has become an important issue. In addition to telling about content of source documents, indicating how to interpret the content, especially showing interpretation of the relation between statements appeared to contradict each other, is important for helping a user judge the credibility of information. In this paper, we will describe the purpose and the way in the construction of a text summarization corpus. Our purpose in the construction of the corpus includes the following three points; to collect Web documents relevant to several query sentences, to prepare gold standard data to evaluate smaller sub-processes in the extraction process and the summary generation process, to investigate the summaries made by human summarizers. The constructed corpus contains six query sentences, 24 manually-constructed summaries, and 24 collections of source Web documents. We also investigated how the descriptions of interpretation, which help a user judge the credibility of other descriptions in the summary, appear in the corpus. As a result, we confirmed that showing interpretation on conflicts is important for helping a user judge the credibility of information.
Anthology ID:
L10-1083
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/135_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Masahiro Nakano, Hideyuki Shibuki, Rintaro Miyazaki, Madoka Ishioroshi, Koichi Kaneko, and Tatsunori Mori. 2010. Construction of Text Summarization Corpus for the Credibility of Information on the Web. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Construction of Text Summarization Corpus for the Credibility of Information on the Web (Nakano et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/135_Paper.pdf