Abstract
This paper describes the zNLP system for the BUCC 2017 shared task. Our system identifies parallel sentence pairs in Chinese-English comparable corpora by translating word-by-word Chinese sentences into English, using the search engine Solr to select near-parallel sentences and then by using an SVM classifier to identify true parallel sentences from the previous results. It obtains an F1-score of 45% (resp. 32%) on the test (training) set.- Anthology ID:
- W17-2510
- Volume:
- Proceedings of the 10th Workshop on Building and Using Comparable Corpora
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Venue:
- BUCC
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 51–55
- Language:
- URL:
- https://aclanthology.org/W17-2510
- DOI:
- 10.18653/v1/W17-2510
- Cite (ACL):
- Zheng Zhang and Pierre Zweigenbaum. 2017. zNLP: Identifying Parallel Sentences in Chinese-English Comparable Corpora. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 51–55, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- zNLP: Identifying Parallel Sentences in Chinese-English Comparable Corpora (Zhang & Zweigenbaum, BUCC 2017)
- PDF:
- https://preview.aclanthology.org/author-url/W17-2510.pdf