Abstract
中文分词是自然语言处理领域的基础工作,然而前人的医学文本分词工作都只是直接套用通用分词的方法,而医学文本多专用术语的特点让分词系统需要对医学专用术语和医学文本中的非医学术语文本提供不同的分词粒度。本文提出了双编码器医学文本中文分词模型,利用辅助编码器为医学专有术语提供粗粒度表示。模型将需要粗粒度分词的医学专用术语和需要通用分词粒度的文本分开,在提升医学专用术语的分词能力的同时最大限度地避免了其粗粒度对于医学文本中通用文本分词的干扰。- Anthology ID:
- 2021.ccl-1.8
- Volume:
- Proceedings of the 20th Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2021
- Address:
- Huhhot, China
- Editors:
- Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 76–85
- Language:
- Chinese
- URL:
- https://preview.aclanthology.org/add_missing_videos/2021.ccl-1.8/
- DOI:
- Cite (ACL):
- Yuan Zong and Baobao Chang. 2021. 基于双编码器的医学文本中文分词(Chinese word segmentation of medical text based on dual-encoder). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 76–85, Huhhot, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于双编码器的医学文本中文分词(Chinese word segmentation of medical text based on dual-encoder) (Zong & Chang, CCL 2021)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2021.ccl-1.8.pdf