An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts
Phuong Le-Hong, Azim Roussanaly, Thi Minh Huyen Nguyen, Mathias Rossignol
Abstract
This paper presents an empirical study on the application of the maximum entropy approach for part-of-speech tagging of Vietnamese text, a language with special characteristics which largely distinguish it from occidental languages. Our best tagger explores and includes useful knowledge sources for tagging Vietnamese text and gives a 93.40%overall accuracy and a 80.69%unknown word accuracy on a test set of the Vietnamese treebank. Our tagger significantly outperforms the tagger that is being used for building the Vietnamese treebank, and as far as we are aware, this is the best tagging result ever published for the Vietnamese language.- Anthology ID:
- 2010.jeptalnrecital-long.36
- Volume:
- Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
- Month:
- July
- Year:
- 2010
- Address:
- Montréal, Canada
- Venue:
- JEP/TALN/RECITAL
- SIG:
- Publisher:
- ATALA
- Note:
- Pages:
- 351–362
- Language:
- URL:
- https://aclanthology.org/2010.jeptalnrecital-long.36
- DOI:
- Cite (ACL):
- Phuong Le-Hong, Azim Roussanaly, Thi Minh Huyen Nguyen, and Mathias Rossignol. 2010. An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, pages 351–362, Montréal, Canada. ATALA.
- Cite (Informal):
- An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts (Le-Hong et al., JEP/TALN/RECITAL 2010)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2010.jeptalnrecital-long.36.pdf