Non-Deterministic Segmentation for Chinese Lattice Parsing

Hai Hu, Daniel Dakota, Sandra Kübler


Abstract
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words.
Anthology ID:
R17-1043
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
316–324
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_043
DOI:
10.26615/978-954-452-049-6_043
Bibkey:
Cite (ACL):
Hai Hu, Daniel Dakota, and Sandra Kübler. 2017. Non-Deterministic Segmentation for Chinese Lattice Parsing. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 316–324, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Non-Deterministic Segmentation for Chinese Lattice Parsing (Hu et al., RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_043