SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues

Hanfei Sun, Ziyuan Cao, Diyi Yang


Abstract
We propose a novel knowledge grounded dialogue (interview) dataset SPORTSINTERVIEW set in the domain of sports interview. Our dataset contains two types of external knowledge sources as knowledge grounding, and is rich in content, containing about 150K interview sessions and 34K distinct interviewees. Compared to existing knowledge grounded dialogue datasets, our interview dataset is larger in size, comprises natural dialogues revolving around real-world sports matches, and have more than one dimension of external knowledge linking. We performed several experiments on SPORTSINTERVIEW and found that models such as BART fine-tuned on our dataset are able to learn lots of relevant domain knowledge and generate meaningful sentences (questions or responses). However, their performance is still far from humans (by comparing to gold sentences in the dataset) and hence encourages future research utilizing SPORTSINTERVIEW.
Anthology ID:
2022.lrec-1.626
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5821–5828
Language:
URL:
https://aclanthology.org/2022.lrec-1.626
DOI:
Bibkey:
Cite (ACL):
Hanfei Sun, Ziyuan Cao, and Diyi Yang. 2022. SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5821–5828, Marseille, France. European Language Resources Association.
Cite (Informal):
SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues (Sun et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.lrec-1.626.pdf
Data
CMU DoGWizard of Wikipedia