Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages
Abubakar Auwal Khalid, Salisu Musa Borodo, Amina Abubakar Imam
Abstract
We present an improved method for automaticparallel sentence alignment in low- resourcelanguages. We used CoHere multilingualembeddings and inverted softmax retrieval.Our technique achieved a higher F1-score of78.30% on the MAFAND-MT test set, comparedto the existing technique’s 54.75%. Precisionand recall have shown similar performance.We assessed the quality of the extracted data bydemonstrating that it outperforms the existingtechnique in terms of low-resource translationperformance.- Anthology ID:
- 2026.africanlp-main.4
- Volume:
- Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
- Venues:
- AfricaNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37–43
- Language:
- URL:
- https://preview.aclanthology.org/manual-author-scripts/2026.africanlp-main.4/
- DOI:
- Cite (ACL):
- Abubakar Auwal Khalid, Salisu Musa Borodo, and Amina Abubakar Imam. 2026. Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 37–43, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages (Khalid et al., AfricaNLP 2026)
- PDF:
- https://preview.aclanthology.org/manual-author-scripts/2026.africanlp-main.4.pdf