A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization
Shohei Higashiyama, Masao Utiyama, Taro Watanabe, Eiichiro Sumita
Abstract
Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.- Anthology ID:
- 2021.wnut-1.9
- Volume:
- Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
- Month:
- November
- Year:
- 2021
- Address:
- Online
- Editors:
- Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 67–80
- Language:
- URL:
- https://aclanthology.org/2021.wnut-1.9
- DOI:
- 10.18653/v1/2021.wnut-1.9
- Cite (ACL):
- Shohei Higashiyama, Masao Utiyama, Taro Watanabe, and Eiichiro Sumita. 2021. A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 67–80, Online. Association for Computational Linguistics.
- Cite (Informal):
- A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization (Higashiyama et al., WNUT 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.wnut-1.9.pdf