Compound noun decomposition using a Markov model

Jongwoo Lee, Byoung-Tak Zhang, Yung Taek Kim


Abstract
A statistical method for compound noun decomposition is presented. Previous studies on this problem showed some statistical information are helpful. But applying statistical information was not so systemic that performance depends heavily on the algorithm and some algorithms usually have many separated steps. In our work statistical information is collected from manually decomposed compound noun corpus to build a Markov model for composition. Two Markov chains representing statistical information are assumed independent: one for the sequence of participants' lengths and another for the sequence of participants ' features. Besides Markov assumptions, least participants preference assumption also is used. These two assumptions enable the decomposition algorithm to be a kind of conditional dynamic programming so that efficient and systemic computation can be performed. When applied to test data of size 5027, we obtained a precision of 98.4%.
Anthology ID:
1999.mtsummit-1.63
Volume:
Proceedings of Machine Translation Summit VII
Month:
September 13-17
Year:
1999
Address:
Singapore, Singapore
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
427–431
Language:
URL:
https://aclanthology.org/1999.mtsummit-1.63
DOI:
Bibkey:
Cite (ACL):
Jongwoo Lee, Byoung-Tak Zhang, and Yung Taek Kim. 1999. Compound noun decomposition using a Markov model. In Proceedings of Machine Translation Summit VII, pages 427–431, Singapore, Singapore.
Cite (Informal):
Compound noun decomposition using a Markov model (Lee et al., MTSummit 1999)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/1999.mtsummit-1.63.pdf