Ahmet Cüneyd Tantuğ

Also published as: A. Cüneyd Tantuǧ, A. Cüneyd Tantuğ

2019

pdf bib abs
Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches
Talha Çolakoğlu | Umut Sulubacak | Ahmet Cüneyd Tantuğ
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.

pdf bib abs
Morpheus: A Neural Network for Jointly Learning Contextual Lemmatization and Morphological Tagging
Eray Yildiz | A. Cüneyd Tantuğ
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

In this study, we present Morpheus, a joint contextual lemmatizer and morphological tagger. Morpheus is based on a neural sequential architecture where inputs are the characters of the surface words in a sentence and the outputs are the minimum edit operations between surface words and their lemmata as well as the morphological tags assigned to the words. The experiments on the datasets in nearly 100 languages provided by SigMorphon 2019 Shared Task 2 organizers show that the performance of Morpheus is comparable to the state-of-the-art system in terms of lemmatization. In morphological tagging, on the other hand, Morpheus significantly outperforms the SigMorphon baseline. In our experiments, we also show that the neural encoder-decoder architecture trained to predict the minimum edit operations can produce considerably better results than the architecture trained to predict the characters in lemmata directly as in previous studies. According to the SigMorphon 2019 Shared Task 2 results, Morpheus has placed 3rd in lemmatization and reached the 9th place in morphological tagging among all participant teams.

2008

pdf bib abs
BLEU+: a Tool for Fine-Grained BLEU Computation
A. Cüneyd Tantuǧ | Kemal Oflazer | Ilknur Durgar El-Kahlout
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a tool, BLEU+, which implements various extension to BLEU computation to allow for a better understanding of the translation performance, especially for morphologically complex languages. BLEU+ takes into account both closeness in morphological structure, closeness of the root words in the WordNet hierarchy while comparing tokens in the candidate and reference sentence. In addition to gauging performance at a finer level of granularity, BLEU+ also allows the computation of various upper bound oracle scores: comparing all tokens considering only the roots allows us to get an upper bound when all errors due to morphological structure are fixed, while comparing tokens in an error-tolerant way considering minor morpheme edit operations, allows us to get a (more realistic) upper bound when tokens that differ in morpheme insertions/deletions and substitutions are fixed. We use BLEU+ in the fine-grained evaluation of the output of our English-to-Turkish statistical MT system.