Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation
Reon Kajikawa, Keiichiro Yamada, Tomoyuki Kajiwara, Takashi Ninomiya
Abstract
To reduce the cost of training models for each language for developers of natural language processing applications, pre-trained multilingual sentence encoders are promising.However, since training corpora for such multilingual sentence encoders contain only a small amount of text in languages other than English, they suffer from performance degradation for non-English languages.To improve the performance of pre-trained multilingual sentence encoders for non-English languages, we propose a method of machine translating a source sentence into English and then inputting it together with the source sentence in a multi-source manner.Experimental results on sentiment analysis and topic classification tasks in Japanese revealed the effectiveness of the proposed method.- Anthology ID:
- 2024.naacl-srw.24
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 226–232
- Language:
- URL:
- https://aclanthology.org/2024.naacl-srw.24
- DOI:
- Cite (ACL):
- Reon Kajikawa, Keiichiro Yamada, Tomoyuki Kajiwara, and Takashi Ninomiya. 2024. Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 226–232, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation (Kajikawa et al., NAACL 2024)
- PDF:
- https://preview.aclanthology.org/ingestion-checklist/2024.naacl-srw.24.pdf