Petr Zemánek
2023
Multi-Parallel Corpus of North Levantine Arabic
Mateusz Krubiński
|
Hashem Sellat
|
Shadi Saleh
|
Adam Pospíšil
|
Petr Zemánek
|
Pavel Pecina
Proceedings of ArabicNLP 2023
Low-resource Machine Translation (MT) is characterized by the scarce availability of training data and/or standardized evaluation benchmarks. In the context of Dialectal Arabic, recent works introduced several evaluation benchmarks covering both Modern Standard Arabic (MSA) and dialects, mapping, however, mostly to a single Indo-European language - English. In this work, we introduce a multi-lingual corpus consisting of 120,600 multi-parallel sentences in English, French, German, Greek, Spanish, and MSA selected from the OpenSubtitles corpus, which were manually translated into the North Levantine Arabic. By conducting a series of training and fine-tuning experiments, we explore how this novel resource can contribute to the research on Arabic MT.
2014
Quotations, Relevance and Time Depth: Medieval Arabic Literature in Grids and Networks
Petr Zemánek
|
Jiří Milička
Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)
Search
Co-authors
- Mateusz Krubiński 1
- Hashem Sellat 1
- Shadi Saleh 1
- Adam Pospíšil 1
- Pavel Pecina 1
- show all...