Abstract
In this paper, we propose a novel two step algorithm for sentence alignment in monolingual corpora using Unfolding Recursive Autoencoders. First, we use unfolding recursive auto-encoders (RAE) to learn feature vectors for phrases in syntactical tree of the sentence. To compare two sentences we use a similarity matrix which has dimensions proportional to the size of the two sentences. Since the similarity matrix generated to compare two sentences has varying dimension due to different sentence lengths, a dynamic pooling layer is used to map it to a matrix of fixed dimension. The resulting matrix is used to calculate the similarity scores between the two sentences. The second step of the algorithm captures the contexts in which the sentences occur in the document by using a dynamic programming algorithm for global alignment.- Anthology ID:
- W17-2503
- Volume:
- Proceedings of the 10th Workshop on Building and Using Comparable Corpora
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
- Venue:
- BUCC
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 16–20
- Language:
- URL:
- https://aclanthology.org/W17-2503
- DOI:
- 10.18653/v1/W17-2503
- Cite (ACL):
- Jeenu Grover and Pabitra Mitra. 2017. Sentence Alignment using Unfolding Recursive Autoencoders. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 16–20, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Sentence Alignment using Unfolding Recursive Autoencoders (Grover & Mitra, BUCC 2017)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/W17-2503.pdf