Weighted Set-Theoretic Alignment of Comparable Sentences

Andoni Azpeitia, Thierry Etchegoyhen, Eva Martínez Garcia

[How to correct problems with metadata yourself]


Abstract
This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. Wedescribe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.
Anthology ID:
W17-2508
Volume:
Proceedings of the 10th Workshop on Building and Using Comparable Corpora
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
Venue:
BUCC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–45
Language:
URL:
https://aclanthology.org/W17-2508
DOI:
10.18653/v1/W17-2508
Bibkey:
Cite (ACL):
Andoni Azpeitia, Thierry Etchegoyhen, and Eva Martínez Garcia. 2017. Weighted Set-Theoretic Alignment of Comparable Sentences. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 41–45, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Weighted Set-Theoretic Alignment of Comparable Sentences (Azpeitia et al., BUCC 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/W17-2508.pdf