Sentence Alignment using Unfolding Recursive Autoencoders

Jeenu Grover, Pabitra Mitra


Abstract
In this paper, we propose a novel two step algorithm for sentence alignment in monolingual corpora using Unfolding Recursive Autoencoders. First, we use unfolding recursive auto-encoders (RAE) to learn feature vectors for phrases in syntactical tree of the sentence. To compare two sentences we use a similarity matrix which has dimensions proportional to the size of the two sentences. Since the similarity matrix generated to compare two sentences has varying dimension due to different sentence lengths, a dynamic pooling layer is used to map it to a matrix of fixed dimension. The resulting matrix is used to calculate the similarity scores between the two sentences. The second step of the algorithm captures the contexts in which the sentences occur in the document by using a dynamic programming algorithm for global alignment.
Anthology ID:
W17-2503
Volume:
Proceedings of the 10th Workshop on Building and Using Comparable Corpora
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
Venue:
BUCC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–20
Language:
URL:
https://aclanthology.org/W17-2503
DOI:
10.18653/v1/W17-2503
Bibkey:
Cite (ACL):
Jeenu Grover and Pabitra Mitra. 2017. Sentence Alignment using Unfolding Recursive Autoencoders. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 16–20, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Sentence Alignment using Unfolding Recursive Autoencoders (Grover & Mitra, BUCC 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/W17-2503.pdf