Abstract
Parallel sentence representations are important for bilingual and cross-lingual tasks in natural language processing. In this paper, we explore a bilingual autoencoder approach to model parallel sentences. We extract sentence-level global descriptors (e.g. min, max) from word embeddings, and construct two monolingual autoencoders over these descriptors on the source and target language. In order to tightly connect the two autoencoders with bilingual correspondences, we force them to share the same decoding parameters and minimize a corpus-level semantic distance between the two languages. Being optimized towards a joint objective function of reconstruction and semantic errors, our bilingual antoencoder is able to learn continuous-valued latent representations for parallel sentences. Experiments on both intrinsic and extrinsic evaluations on statistical machine translation tasks show that our autoencoder achieves substantial improvements over the baselines.- Anthology ID:
- C16-1240
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Yuji Matsumoto, Rashmi Prasad
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 2548–2558
- Language:
- URL:
- https://aclanthology.org/C16-1240
- DOI:
- Cite (ACL):
- Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2548–2558, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences (Zhang et al., COLING 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/C16-1240.pdf