Abstract
Distributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that aligns two sets of monolingual word embeddings without high quality parallel data, such as a dictionary or a sentence-aligned corpus. However, without an additional step of refinement, the preliminary mapping learnt by these methods is unsatisfactory, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which improves the performance of previous unsupervised adversarial methods for most languages, and especially for typologically distant language pairs.- Anthology ID:
- D19-1450
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4419–4430
- Language:
- URL:
- https://aclanthology.org/D19-1450
- DOI:
- 10.18653/v1/D19-1450
- Cite (ACL):
- Haozhou Wang, James Henderson, and Paola Merlo. 2019. Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4419–4430, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings (Wang et al., EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/D19-1450.pdf