Abstract
We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models. By formulating posterior inference through Gibbs sampling as a process of language transfer, we propose a new measure that quantifies the loss of knowledge across languages during this process. This measure enables us to derive a PAC-Bayesian bound that elucidates the factors affecting model quality, both during training and in downstream applications. We provide experimental validation of the analysis on a diverse set of five languages, and discuss best practices for data collection and model design based on our analysis.- Anthology ID:
- N19-1158
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1551–1565
- Language:
- URL:
- https://aclanthology.org/N19-1158
- DOI:
- 10.18653/v1/N19-1158
- Cite (ACL):
- Shudong Hao and Michael J. Paul. 2019. Analyzing Bayesian Crosslingual Transfer in Topic Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1551–1565, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Analyzing Bayesian Crosslingual Transfer in Topic Models (Hao & Paul, NAACL 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/N19-1158.pdf