Discriminative, Syntactic Language Modeling through Latent SVMs

Colin Cherry; Chris Quirk

Discriminative, Syntactic Language Modeling through Latent SVMs

Abstract

We construct a discriminative, syntactic language model (LM) by using a latent support vector machine (SVM) to train an unlexicalized parser to judge sentences. That is, the parser is optimized so that correct sentences receive high-scoring trees, while incorrect sentences do not. Because of this alternative objective, the parser can be trained with only a part-of-speech dictionary and binary-labeled sentences. We follow the paradigm of discriminative language modeling with pseudo-negative examples (Okanohara and Tsujii, 2007), and demonstrate significant improvements in distinguishing real sentences from pseudo-negatives. We also investigate the related task of separating machine-translation (MT) outputs from reference translations, again showing large improvements. Finally, we test our LM in MT reranking, and investigate the language-modeling parser in the context of unsupervised parsing.

Anthology ID:: 2008.amta-papers.4
Volume:: Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:: October 21-25
Year:: 2008
Address:: Waikiki, USA
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 65–74
Language:
URL:: https://aclanthology.org/2008.amta-papers.4
DOI:
Bibkey:
Cite (ACL):: Colin Cherry and Chris Quirk. 2008. Discriminative, Syntactic Language Modeling through Latent SVMs. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 65–74, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Discriminative, Syntactic Language Modeling through Latent SVMs (Cherry & Quirk, AMTA 2008)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2008.amta-papers.4.pdf

PDF Search