Kemal Oflazer


2021

pdf
Semantic Similarity Based Evaluation for Abstractive News Summarization
Figen Beken Fikri | Kemal Oflazer | Berrin Yanikoglu
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.

2018

pdf
Interoperable Annotation of Events and Event Relations across Domains
Jun Araki | Lamana Mulaffer | Arun Pandian | Yukari Yamakawa | Kemal Oflazer | Teruko Mitamura
Proceedings 14th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

pdf
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction
Ossama Obeid | Salam Khalifa | Nizar Habash | Houda Bouamor | Wajdi Zaghouani | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
The MADAR Arabic Dialect Corpus and Lexicon
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
Using Ambiguity Detection to Streamline Linguistic Annotation
Wajdi Zaghouani | Abdelati Hawwari | Sawsan Alqahtani | Houda Bouamor | Mahmoud Ghoneim | Mona Diab | Kemal Oflazer
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics. In addition to the lexical ambiguity exhibited in most languages, the lack of diacritics in written Arabic adds another layer of ambiguity which is an artifact of the orthography. In this paper, we present the details of three annotation experimental conditions designed to study the impact of automatic ambiguity detection, on annotation speed and quality in a large scale annotation project.

pdf
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
Wajdi Zaghouani | Nizar Habash | Ossama Obeid | Behrang Mohit | Houda Bouamor | Kemal Oflazer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

pdf
Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
Wajdi Zaghouani | Houda Bouamor | Abdelati Hawwari | Mona Diab | Ossama Obeid | Mahmoud Ghoneim | Sawsan Alqahtani | Kemal Oflazer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres. The target size of the annotated corpus is 2 million words. We summarize the guidelines and describe issues encountered during the training of the annotators. We also discuss the challenges posed by the complexity of the Arabic language and how they are addressed. Finally, we present the diacritization annotation procedure and detail the quality of the resulting annotations.

2015

pdf
Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus
Wajdi Zaghouani | Nizar Habash | Houda Bouamor | Alla Rozovskaya | Behrang Mohit | Abeer Heider | Kemal Oflazer
Proceedings of the 9th Linguistic Annotation Workshop

pdf
A Pilot Study on Arabic Multi-Genre Corpus Diacritization
Houda Bouamor | Wajdi Zaghouani | Mona Diab | Ossama Obeid | Kemal Oflazer | Mahmoud Ghoneim | Abdelati Hawwari
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf
QCMUQ@QALB-2015 Shared Task: Combining Character level MT and Error-tolerant Finite-State Recognition for Arabic Spelling Correction
Houda Bouamor | Hassan Sajjad | Nadir Durrani | Kemal Oflazer
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf
A Human Judgement Corpus and a Metric for Arabic MT Evaluation
Houda Bouamor | Hanan Alshikhabobakr | Behrang Mohit | Kemal Oflazer
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
CMUQ@QALB-2014: An SMT-based System for Automatic Arabic Error Correction
Serena Jeblee | Houda Bouamor | Wajdi Zaghouani | Kemal Oflazer
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf
Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic
Serena Jeblee | Weston Feely | Houda Bouamor | Alon Lavie | Nizar Habash | Kemal Oflazer
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf
A Multidialectal Parallel Corpus of Arabic
Houda Bouamor | Nizar Habash | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic. There are many Arabic dialects across the Arab World and within other Arabic speaking communities. These dialects vary widely from region to region and to a lesser extent from city to city in each region. The dialects are not standardized, they are not taught, and they do not have official status. However they are the primary vehicles of communication (face-to-face and recently, online) and have a large presence in the arts as well. In this paper, we present the first multidialectal Arabic parallel corpus, a collection of 2,000 sentences in Standard Arabic, Egyptian, Tunisian, Jordanian, Palestinian and Syrian Arabic, in addition to English. Such parallel data does not exist naturally, which makes this corpus a very valuable resource that has many potential applications such as Arabic dialect identification and machine translation.

pdf
YouDACC: the Youtube Dialectal Arabic Comment Corpus
Ahmed Salama | Houda Bouamor | Behrang Mohit | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents YOUDACC, an automatically annotated large-scale multi-dialectal Arabic corpus collected from user comments on Youtube videos. Our corpus covers different groups of dialects: Egyptian (EG), Gulf (GU), Iraqi (IQ), Maghrebi (MG) and Levantine (LV). We perform an empirical analysis on the crawled corpus and demonstrate that our location-based proposed method is effective for the task of dialect labeling.

pdf
Large Scale Arabic Error Annotation: Guidelines and Framework
Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Ossama Obeid | Nadi Tomeh | Alla Rozovskaya | Noura Farra | Sarah Alkuhlani | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

2013

pdf
SuMT: A Framework of Summarization and MT
Houda Bouamor | Behrang Mohit | Kemal Oflazer
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
A Web-based Annotation Framework For Large-Scale Text Correction
Ossama Obeid | Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Kemal Oflazer | Nadi Tomeh
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf
An English Reading Tool as a NLP Showcase
Mahmoud Azab | Ahmed Salama | Kemal Oflazer | Hideki Shima | Jun Araki | Teruko Mitamura
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf
An NLP-based Reading Tool for Aiding Non-native English Readers
Mahmoud Azab | Ahmed Salama | Kemal Oflazer | Hideki Shima | Jun Araki | Teruko Mitamura
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
Multi-Rate HMMs for Word Alignment
Elif Eyigöz | Daniel Gildea | Kemal Oflazer
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf
Typesetting for Improved Readability using Lexical and Syntactic Information
Ahmed Salama | Kemal Oflazer | Susan Hagan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Simultaneous Word-Morpheme Alignment for Statistical Machine Translation
Elif Eyigöz | Daniel Gildea | Kemal Oflazer
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Dudley North visits North London: Learning When to Transliterate to Arabic
Mahmoud Azab | Houda Bouamor | Behrang Mohit | Kemal Oflazer
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Supersense Tagging for Arabic: the MT-in-the-Middle Attack
Nathan Schneider | Behrang Mohit | Chris Dyer | Kemal Oflazer | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf
Annotating and Learning Morphological Segmentation of Egyptian Colloquial Arabic
Emad Mohamed | Behrang Mohit | Kemal Oflazer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present an annotation and morphological segmentation scheme for Egyptian Colloquial Arabic (ECA) in which we annotate user-generated content that significantly deviates from the orthographic and grammatical rules of Modern Standard Arabic and thus cannot be processed by the commonly used MSA tools. Using a per letter classification scheme in which each letter is classified as either a segment boundary or not, and using a memory-based classifier, with only word-internal context, prove effective and achieve a 92% exact match accuracy at the word level. The well-known MADA system achieves 81% while the per letter classification scheme using the ATB achieves 82%. Error analysis shows that the major problem is that of character ambiguity since the ECA orthography overloads the characters which would otherwise be more specific in MSA, like the differences between y (ي) and Y (ى) and A (ا) , > ( أ), and < (إ) which are collapsed to y (ي) and A (ا) respectively or even totally confused and interchangeable. While normalization helps alleviate orthographic inconsistencies, it aggravates the problem of ambiguity.

pdf
Transforming Standard Arabic to Colloquial Arabic
Emad Mohamed | Behrang Mohit | Kemal Oflazer
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
Nathan Schneider | Behrang Mohit | Kemal Oflazer | Noah A. Smith
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Recall-Oriented Learning of Named Entities in Arabic Wikipedia
Behrang Mohit | Nathan Schneider | Rishav Bhowmick | Kemal Oflazer | Noah A. Smith
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2010

pdf
Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
Reyyan Yeniterzi | Kemal Oflazer
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2008

pdf
BLEU+: a Tool for Fine-Grained BLEU Computation
A. Cüneyd Tantuǧ | Kemal Oflazer | Ilknur Durgar El-Kahlout
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a tool, BLEU+, which implements various extension to BLEU computation to allow for a better understanding of the translation performance, especially for morphologically complex languages. BLEU+ takes into account both “closeness” in morphological structure, “closeness” of the root words in the WordNet hierarchy while comparing tokens in the candidate and reference sentence. In addition to gauging performance at a finer level of granularity, BLEU+ also allows the computation of various upper bound oracle scores: comparing all tokens considering only the roots allows us to get an upper bound when all errors due to morphological structure are fixed, while comparing tokens in an error-tolerant way considering minor morpheme edit operations, allows us to get a (more realistic) upper bound when tokens that differ in morpheme insertions/deletions and substitutions are fixed. We use BLEU+ in the fine-grained evaluation of the output of our English-to-Turkish statistical MT system.

pdf
Dependency Parsing of Turkish
Gülşen Eryiğit | Joakim Nivre | Kemal Oflazer
Computational Linguistics, Volume 34, Number 3, September 2008

pdf
Erratum: Dependency Parsing of Turkish
Gülşen Eryiğit | Joakim Nivre | Kemal Oflazer
Computational Linguistics, Volume 34, Number 4, December 2008

2007

pdf
Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation
Kemal Oflazer | İlknur Durgar El-Kahlout
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
A MT system from Turkmen to Turkish employing finite state and statistical methods
Ahmet Cüneyd Tantuğ | Eşref Adali | Kemal Oflazer
Proceedings of Machine Translation Summit XI: Papers

pdf
Machine Translation between Turkic Languages
Ahmet Cüneyd Tantuğ | Eşref Adali | Kemal Oflazer
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
Morphology-Syntax Interface for Turkish LFG
Özlem Çetinoğlu | Kemal Oflazer
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
Statistical Dependency Parsing for Turkish
Gülşen Eryiǧit | Kemal Oflazer
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Initial Explorations in English to Turkish Statistical Machine Translation
İlknur Durgar El-Kahlout | Kemal Oflazer
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)
Kevin Knight | Hwee Tou Ng | Kemal Oflazer
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf
Vi-xfst: A Visual Regular Expression Development Environment for Xerox Finite State Tool
Kemal Oflazer | Yasin Yılmaz
Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology

pdf
Integrating Morphology with Multi-word Expression Processing in Turkish
Kemal Oflazer | Özlem Çetinoğlu | Bilge Say
Proceedings of the Workshop on Multiword Expressions: Integrating Processing

2003

pdf
Dependency Parsing with an Extended Finite-State Approach
Kemal Oflazer
Computational Linguistics, Volume 29, Number 4, December 2003

pdf
The Annotation Process in the Turkish Treebank
Nart B. Atalay | Kemal Oflazer | Bilge Say
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

2001

pdf
Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning
Kemal Oflazer | Sergei Nirenberg | Marjorie McShane
Computational Linguistics, Volume 27, Number 1, March 2001

2000

pdf
Introduction to the Special issue on finite state methods in NLP
Lauri Karttunen | Kemal Oflazer
Computational Linguistics, Volume 26, Number 1, March 2000

pdf
Statistical Morphological Disambiguation for Agglutinative Languages
Dilek Z. Hakkani-Tür | Kemal Oflazer | Gökhan Tür
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1999

pdf
Dependency Parsing with an Extended Finite State Approach
Kemal Oflazer
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

pdf
Practical Bootstrapping of Morphological Analyzers
Kemal Oflazer | Sergei Nirenburg
EACL 1999: CoNLL-99 Computational Natural Language Learning

1998

pdf
Tagging English by Path Voting Constraints
Gokhan Tur | Kemal Oflazer
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf
An English-to-Turkish interlingual MT system
Dilek Zeynap Hakkani | Göklan Tür | Kemal Oflazer | Teruko Mitamura | Eric H. Nyberg, 3rd
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes the integration of a Turkish generation system with the KANT knowledge-based machine translation system to produce a prototype English-Turkish interlingua-based machine translation system. These two independently constructed systems were successfully integrated within a period of two months, through development of a module which maps KANT interlingua expressions to Turkish syntactic structures. The combined system is able to translate completely and correctly 44 of 52 benchmark sentences in the domain of broadcast news captions. This study is the first known application of knowledge-based machine translation from English to Turkish, and our initial results show promise for future development.

pdf
Implementing Voting Constraints with Finite State Transducers
Kemal Oflazer | Gokhan Tur
Finite State Methods in Natural Language Processing

pdf
Tagging English by Path Voting Constraints
Gokhan Tlir | Kemal Oflazer
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

pdf
Morphological Disambiguation by Voting Constraints
Kemal Oflazer | Gokhan Tur
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf
A Constraint-based Case Frame Lexicon
Kemal Oflazer | Okan Yllmaz
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

pdf
Error-tolerant Tree Matching
Kemal Oflazer
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

pdf
Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation
Kemal Oflazer | Gokhan Tur
Conference on Empirical Methods in Natural Language Processing

pdf
Tactical Generation in a Free Constituent Order Language
Dilek Zeynep Hakkani | Kemal Oflazer | Ilyas Cicekli
Eighth International Natural Language Generation Workshop

pdf
Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction
Kemal Oflazer
Computational Linguistics, Volume 22, Number 1, March 1996

1995

pdf
Error-tolerant Finite State Recognition
Kemal Oflazer
Proceedings of the Fourth International Workshop on Parsing Technologies

Error-tolerant recognition enables the recognition of strings that deviate slightly from any string in the regular set recognized by the underlying finite state recognizer. In the context of natural language processing, it has applications in error-tolerant morphological analysis, and spelling correction. After a description of the concepts and algorithms involved, we give examples from these two applications: In morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. The algorithm can be applied to the moiphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes (such as agglutination or productive compounding) and morphographemic phenomena involved. We present an application to error tolerant analysis of agglutinative morphology of Turkish words. In spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. It can be applied to any language whose morphology is fully described by a finite state transducer, or with a word list comprising all inflected forms with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerant recognition operating with a (circular) recognizer of Turkish words (with about 29,000 states and 119,000 transitions) can generate all candidate words in less than 20 milliseconds (with edit distance 1). Spelling correction using a recognizer constructed from a large word German list that simulates compounding, also indicates that the approach is applicable in such cases.

1994

pdf
Tagging and Morphological Disambiguation of Turkish Text
Kemal Oflazer | Ilker Kuruoz
Fourth Conference on Applied Natural Language Processing

pdf
Spelling Correction in Agglutinative Languages
Kemal Oflazer | Cemaleddin Guzey
Fourth Conference on Applied Natural Language Processing

pdf
Parsing Turkish Using the Lexical Functional Grammar Formalism
Zelal Gungordu | Kemal Oflazer
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1993

pdf
Two-level Description of Turkish Morphology
Kemal Oflazer
Sixth Conference of the European Chapter of the Association for Computational Linguistics

1992

pdf
Parsing Agglutinative Word Structures and Its Application to Spelling Checking for Turkish
Aysin Solak | Kemal Oflazer
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics