Abstract
Supervised training of neural models to duplicate question detection in community Question Answering (CQA) requires large amounts of labeled question pairs, which can be costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data. We provide a comparison of popular training strategies and show that our proposed approaches are more effective in many cases because they can utilize larger amounts of data from the CQA forums. Finally, we show that weak supervision with question title and body information is also an effective method to train CQA answer selection models without direct answer supervision.- Anthology ID:
- D19-1171
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1607–1617
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/D19-1171/
- DOI:
- 10.18653/v1/D19-1171
- Cite (ACL):
- Andreas Rücklé, Nafise Sadat Moosavi, and Iryna Gurevych. 2019. Neural Duplicate Question Detection without Labeled Training Data. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1607–1617, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Neural Duplicate Question Detection without Labeled Training Data (Rücklé et al., EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/D19-1171.pdf
- Code
- UKPLab/emnlp2019-duplicate_question_detection