Analyzing Bayesian Crosslingual Transfer in Topic Models

Shudong Hao, Michael J. Paul


Abstract
We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models. By formulating posterior inference through Gibbs sampling as a process of language transfer, we propose a new measure that quantifies the loss of knowledge across languages during this process. This measure enables us to derive a PAC-Bayesian bound that elucidates the factors affecting model quality, both during training and in downstream applications. We provide experimental validation of the analysis on a diverse set of five languages, and discuss best practices for data collection and model design based on our analysis.
Anthology ID:
N19-1158
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1551–1565
Language:
URL:
https://aclanthology.org/N19-1158
DOI:
10.18653/v1/N19-1158
Bibkey:
Cite (ACL):
Shudong Hao and Michael J. Paul. 2019. Analyzing Bayesian Crosslingual Transfer in Topic Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1551–1565, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Analyzing Bayesian Crosslingual Transfer in Topic Models (Hao & Paul, NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/N19-1158.pdf
Supplementary:
 N19-1158.Supplementary.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-1/N19-1158.mp4