Abstract
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words.- Anthology ID:
- R17-1043
- Volume:
- Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 316–324
- Language:
- URL:
- https://doi.org/10.26615/978-954-452-049-6_043
- DOI:
- 10.26615/978-954-452-049-6_043
- Cite (ACL):
- Hai Hu, Daniel Dakota, and Sandra Kübler. 2017. Non-Deterministic Segmentation for Chinese Lattice Parsing. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 316–324, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Non-Deterministic Segmentation for Chinese Lattice Parsing (Hu et al., RANLP 2017)
- PDF:
- https://doi.org/10.26615/978-954-452-049-6_043