Abstract
This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. Wedescribe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.- Anthology ID:
- W17-2508
- Volume:
- Proceedings of the 10th Workshop on Building and Using Comparable Corpora
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
- Venue:
- BUCC
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 41–45
- Language:
- URL:
- https://aclanthology.org/W17-2508
- DOI:
- 10.18653/v1/W17-2508
- Cite (ACL):
- Andoni Azpeitia, Thierry Etchegoyhen, and Eva Martínez Garcia. 2017. Weighted Set-Theoretic Alignment of Comparable Sentences. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 41–45, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Weighted Set-Theoretic Alignment of Comparable Sentences (Azpeitia et al., BUCC 2017)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/W17-2508.pdf