Shan Huang
2023
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
|
Yudong Li
|
Cheng Hou
|
Jing Zhao
|
Rong Tian
|
Weijie Liu
|
Yiren Chen
|
Ningyuan Sun
|
Haoyan Liu
|
Weiquan Mao
|
Han Guo
|
Weigang Gou
|
Taiqiang Wu
|
Tao Zhu
|
Wenhang Shi
|
Chen Chen
|
Shan Huang
|
Sihong Chen
|
Liqun Liu
|
Feifei Li
|
Xiaoshuai Chen
|
Xingwu Sun
|
Zhanhui Kang
|
Xiaoyong Du
|
Linlin Shen
|
Kimmo Yan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.