Abstract
Wikipedia is widely used to train models for various tasks including semantic association, text generation, and translation. These tasks typically involve aligning and using text from multiple language editions, with the assumption that all versions of the article present the same content. But this assumption may not hold. We introduce a methodology for approximating the extent to which narratives of conflict may diverge in this scenario, focusing on articles about World War I and II battles written by Wikipedia’s communities of editors across four language editions. For simplicity, our unit of analysis representing each language communities’ perspectives is based on national entities and their subject-object-relation context, identified using named entity recognition and open-domain information extraction. Using a vector representation of these tuples, we evaluate how similarly different language editions portray how and how often these entities are mentioned in articles. Our results indicate that (1) language editions tend to reference associated countries more and (2) how much one language edition’s depiction overlaps with all others varies.- Anthology ID:
- 2022.latechclfl-1.12
- Volume:
- Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Stefania Degaetano, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
- Venue:
- LaTeCHCLfL
- SIG:
- SIGHUM
- Publisher:
- International Conference on Computational Linguistics
- Note:
- Pages:
- 94–104
- Language:
- URL:
- https://aclanthology.org/2022.latechclfl-1.12
- DOI:
- Cite (ACL):
- Ana Smith and Lillian Lee. 2022. War and Pieces: Comparing Perspectives About World War I and II Across Wikipedia Language Communities. In Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 94–104, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
- Cite (Informal):
- War and Pieces: Comparing Perspectives About World War I and II Across Wikipedia Language Communities (Smith & Lee, LaTeCHCLfL 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.latechclfl-1.12.pdf