Abstract
In the pursuit of natural language understanding, there has been a long standing interest in tracking state changes throughout narratives. Impressive progress has been made in modeling the state of transaction-centric dialogues and procedural texts. However, this problem has been less intensively studied in the realm of general discourse where ground truth descriptions of states may be loosely defined and state changes are less densely distributed over utterances. This paper proposes to turn to simplified, fully observable systems that show some of these properties: Sports events. We curated 2,263 soccer matches including time-stamped natural language commentary accompanied by discrete events such as a team scoring goals, switching players or being penalized with cards. We propose a new task formulation where, given paragraphs of commentary of a game at different timestamps, the system is asked to recognize the occurrence of in-game events. This domain allows for rich descriptions of state while avoiding the complexities of many other real-world settings. As an initial point of performance measurement, we include two baseline methods from the perspectives of sentence classification with temporal dependence and current state-of-the-art generative model, respectively, and demonstrate that even sophisticated existing methods struggle on the state tracking task when the definition of state broadens or non-event chatter becomes prevalent.- Anthology ID:
- 2021.naacl-main.342
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4325–4333
- Language:
- URL:
- https://aclanthology.org/2021.naacl-main.342
- DOI:
- 10.18653/v1/2021.naacl-main.342
- Cite (ACL):
- Ruochen Zhang and Carsten Eickhoff. 2021. SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4325–4333, Online. Association for Computational Linguistics.
- Cite (Informal):
- SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain (Zhang & Eickhoff, NAACL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2021.naacl-main.342.pdf
- Data
- MultiWOZ, Open PI