InfoSync: Information Synchronization across Multilingual Semi-structured Tables
Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo Zhang
Abstract
Information Synchronization of semi-structured data across languages is challenging. For example, Wikipedia tables in one language need to be synchronized with others. To address this problem, we introduce a new dataset InfoSync and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (~3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 532 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.- Anthology ID:
- 2023.findings-acl.159
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2536–2559
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.159
- DOI:
- 10.18653/v1/2023.findings-acl.159
- Cite (ACL):
- Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, and Shuo Zhang. 2023. InfoSync: Information Synchronization across Multilingual Semi-structured Tables. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2536–2559, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- InfoSync: Information Synchronization across Multilingual Semi-structured Tables (Khincha et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/landing_page/2023.findings-acl.159.pdf