Abstract
This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.- Anthology ID:
- W17-5407
- Volume:
- Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Emily Bender, Hal Daumé III, Allyson Ettinger, Sudha Rao
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 48–53
- Language:
- URL:
- https://aclanthology.org/W17-5407
- DOI:
- 10.18653/v1/W17-5407
- Cite (ACL):
- Tomasz Jurczyk and Jinho D. Choi. 2017. Cross-genre Document Retrieval: Matching between Conversational and Formal Writings. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pages 48–53, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Cross-genre Document Retrieval: Matching between Conversational and Formal Writings (Jurczyk & Choi, 2017)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W17-5407.pdf