Abstract
We construct a discriminative, syntactic language model (LM) by using a latent support vector machine (SVM) to train an unlexicalized parser to judge sentences. That is, the parser is optimized so that correct sentences receive high-scoring trees, while incorrect sentences do not. Because of this alternative objective, the parser can be trained with only a part-of-speech dictionary and binary-labeled sentences. We follow the paradigm of discriminative language modeling with pseudo-negative examples (Okanohara and Tsujii, 2007), and demonstrate significant improvements in distinguishing real sentences from pseudo-negatives. We also investigate the related task of separating machine-translation (MT) outputs from reference translations, again showing large improvements. Finally, we test our LM in MT reranking, and investigate the language-modeling parser in the context of unsupervised parsing.- Anthology ID:
- 2008.amta-papers.4
- Volume:
- Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
- Month:
- October 21-25
- Year:
- 2008
- Address:
- Waikiki, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 65–74
- Language:
- URL:
- https://aclanthology.org/2008.amta-papers.4
- DOI:
- Cite (ACL):
- Colin Cherry and Chris Quirk. 2008. Discriminative, Syntactic Language Modeling through Latent SVMs. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 65–74, Waikiki, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Discriminative, Syntactic Language Modeling through Latent SVMs (Cherry & Quirk, AMTA 2008)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2008.amta-papers.4.pdf