Cross-genre Document Retrieval: Matching between Conversational and Formal Writings

Tomasz Jurczyk, Jinho D. Choi


Abstract
This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.
Anthology ID:
W17-5407
Volume:
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Emily Bender, Hal Daumé III, Allyson Ettinger, Sudha Rao
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–53
Language:
URL:
https://aclanthology.org/W17-5407
DOI:
10.18653/v1/W17-5407
Bibkey:
Cite (ACL):
Tomasz Jurczyk and Jinho D. Choi. 2017. Cross-genre Document Retrieval: Matching between Conversational and Formal Writings. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pages 48–53, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Cross-genre Document Retrieval: Matching between Conversational and Formal Writings (Jurczyk & Choi, 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/W17-5407.pdf