Abstract
We explore the extent to which neural networks can learn to identify semantically equivalent sentences from a small variable dataset using an end-to-end training. We collect a new noisy non-standardised user-generated Algerian (ALG) dataset and also translate it to Modern Standard Arabic (MSA) which serves as its regularised counterpart. We compare the performance of various models on both datasets and report the best performing configurations. The results show that relatively simple models composed of 2 LSTM layers outperform by far other more sophisticated attention-based architectures, for both ALG and MSA datasets.- Anthology ID:
- W19-4609
- Volume:
- Proceedings of the Fourth Arabic Natural Language Processing Workshop
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 78–87
- Language:
- URL:
- https://aclanthology.org/W19-4609
- DOI:
- 10.18653/v1/W19-4609
- Cite (ACL):
- Wafia Adouane, Jean-Philippe Bernardy, and Simon Dobnik. 2019. Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 78–87, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA (Adouane et al., WANLP 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W19-4609.pdf