Abstract
Narrative modelling is an area of active research, motivated by the acknowledgement of narratives as drivers of societal decision making. These research efforts conceptualize narratives as connected entity chains, and modeling typically focuses on the identification of entities and their connections within a text. An emerging approach to narrative modelling is the use of semantic role labeling (SRL) to extract Entity-Verb-Entity (E-V-Es) tuples from a text, followed by dimensionality reduction to reduce the space of entities and connections separately. This process penalises the semantic richness of narratives and discards much contextual information along the way. Here, we propose an alternate narrative extraction approach - CANarEx, incorporating a pipeline of common contextual constructs through co-reference resolution, micro-narrative generation and clustering of these narratives through sentence embeddings. We evaluate our approach through testing the recovery of “narrative time-series clusters”, mimicking a desirable text-as-data task. The evaluation framework leverages synthetic data generated using a GPT-3 model. The GPT-3 model is trained to generate similar sentences using a large dataset of news articles. The synthetic data maps to three topics in the news dataset. We then generate narrative time-series document cluster representations by mapping the synthetic data to three distinct signals synthetically injected into the testing corpus. Evaluation results demonstrate the superior ability of CANarEx to recover narrative time-series through reduced MSE and improved precision/recall relative to existing methods. The validity is further reinforced through ablation studies and qualitative analysis.- Anthology ID:
- 2022.findings-emnlp.260
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3551–3564
- Language:
- URL:
- https://aclanthology.org/2022.findings-emnlp.260
- DOI:
- 10.18653/v1/2022.findings-emnlp.260
- Cite (ACL):
- Nandini Anantharama, Simon Angus, and Lachlan O’Neill. 2022. CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3551–3564, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data Applications (Anantharama et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2022.findings-emnlp.260.pdf