FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web
Cheng-Wen Lin, Wan-Hsuan Hsieh, Kai-Xin Guan, Chan-Jan Hsu, Chia-Chen Kuo, Chuan-Lin Lai, Chung-Wei Chung, Ming-Jen Wang, Da-Shan Shiu
- Anthology ID:
- 2024.rocling-1.16
- Volume:
- Proceedings of the 36th Conference on Computational Linguistics and Speech Processing (ROCLING 2024)
- Month:
- November
- Year:
- 2024
- Address:
- Taipei City, Taiwan
- Editors:
- Shu-Chuan Tseng, Yu Tsao, Hen-Hsen Huang, Yao-Chung Fan, Chia-Hui Chang
- Venue:
- ROCLING
- SIG:
- Publisher:
- The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
- Note:
- Pages:
- 129–136
- Language:
- URL:
- https://preview.aclanthology.org/issues-pwc/2024.rocling-1.16/
- DOI:
- Cite (ACL):
- Cheng-Wen Lin, Wan-Hsuan Hsieh, Kai-Xin Guan, Chan-Jan Hsu, Chia-Chen Kuo, Chuan-Lin Lai, Chung-Wei Chung, Ming-Jen Wang, and Da-Shan Shiu. 2024. FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web. In Proceedings of the 36th Conference on Computational Linguistics and Speech Processing (ROCLING 2024), pages 129–136, Taipei City, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
- Cite (Informal):
- FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web (Lin et al., ROCLING 2024)
- PDF:
- https://preview.aclanthology.org/issues-pwc/2024.rocling-1.16.pdf