Deniz Zeyrek

2020

pdf abs
TED-MDB Lexicons: Tr-EnConnLex, Pt-EnConnLex
Murathan Kurfalı | Sibel Ozer | Deniz Zeyrek | Amália Mendes
Proceedings of the First Workshop on Computational Approaches to Discourse

In this work, we present two new bilingual discourse connective lexicons, namely, for Turkish-English and European Portuguese-English created automatically using the existing discourse relation-aligned TED-MDB corpus. In their current form, the Pt-En lexicon includes 95 entries, whereas the Tr-En lexicon contains 133 entries. The lexicons constitute the first step of a larger project of developing a multilingual discourse connective lexicon.

pdf abs
Turkish Emotion Voice Database (TurEV-DB)
Salih Firat Canpolat | Zuhal Ormanoğlu | Deniz Zeyrek
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

We introduce the Turkish Emotion-Voice Database (TurEV-DB) which involves a corpus of over 1700 tokens based on 82 words uttered by human subjects in four different emotions (angry, calm, happy, sad). Three machine learning experiments are run on the corpus data to classify the emotions using a convolutional neural network (CNN) model and a support vector machine (SVM) model. We report the performance of the machine learning models, and for evaluation, compare machine learning results with the judgements of humans.

2019

pdf abs
TCL - a Lexicon of Turkish Discourse Connectives
Deniz Zeyrek | Kezban Başıbüyük
Proceedings of the First International Workshop on Designing Meaning Representations

It is known that discourse connectives are the most salient indicators of discourse relations. State-of-the-art parsers being developed to predict explicit discourse connectives exploit annotated discourse corpora but a lexicon of discourse connectives is also needed to enable further research in discourse structure and support the development of language technologies that use these structures for text understanding. This paper presents a lexicon of Turkish discourse connectives built by automatic means. The lexicon has the format of the German connective lexicon, DiMLex, where for each discourse connective, information about the connective‘s orthographic variants, syntactic category and senses are provided along with sample relations. In this paper, we describe the data sources we used and the development steps of the lexicon.

abs
An automatic discourse relation alignment experiment on TED-MDB
Sibel Ozer | Deniz Zeyrek
Proceedings of the 2019 Workshop on Widening NLP

This paper describes an automatic discourse relation alignment experiment as an empirical justification of the planned annotation projection approach to enlarge the 3600-word multilingual corpus of TED Multilingual Discourse Bank (TED-MDB). The experiment is carried out on a single language pair (English-Turkish) included in TED-MDB. The paper first describes the creation of a large corpus of English-Turkish bi-sentences, then it presents a sense-based experiment that automatically aligns the relations in the English sentences of TED-MDB with the Turkish sentences. The results are very close to the results obtained from an earlier semi-automatic post-annotation alignment experiment validated by human annotators and are encouraging for future annotation projection tasks.

pdf bib
Proceedings of the 13th Linguistic Annotation Workshop
Annemarie Friedrich | Deniz Zeyrek | Jet Hoek
Proceedings of the 13th Linguistic Annotation Workshop

2018

pdf
Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank
Deniz Zeyrek | Amália Mendes | Murathan Kurfalı
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank
Deniz Zeyrek | Murathan Kurfalı
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
TDB 1.1: Extensions on Turkish Discourse Bank
Deniz Zeyrek | Murathan Kurfalı
Proceedings of the 11th Linguistic Annotation Workshop

This paper presents the recent developments on Turkish Discourse Bank (TDB). First, the resource is summarized and an evaluation is presented. Then, TDB 1.1, i.e. enrichments on 10% of the corpus are described (namely, senses for explicit discourse connectives, and new annotations for three discourse relation types - implicit relations, entity relations and alternative lexicalizations). The method of annotation is explained and the data are evaluated.

2016

pdf abs
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Elif Ahsen Acar | Deniz Zeyrek | Murathan Kurfalı | Cem Bozşahin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children’s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.

2014

pdf
Annotating Discourse Connectives in Spoken Turkish
Isin Demirşahin | Deniz Zeyrek
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf abs
Turkish Resources for Visual Word Recognition
Begüm Erten | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the words are used as the input, which are decomposed into their sub-syllabic components. The bigram frequency chains are constructed by the entire words’ onset, nucleus and coda patterns. Lexical statistics of stems and their syllabification are compiled by us from BOUN corpus of 490 million words. Use of these tools in some experiments is shown.

2013

pdf
Applicative Structures and Immediate Discourse in the Turkish Discourse Bank
Isin Demirşahin | Adnan Öztürel | Cem Bozşahin | Deniz Zeyrek
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

2012

pdf abs
METU Turkish Discourse Bank Browser
Utku Şirin | Ruket Çakıcı | Deniz Zeyrek
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, the METU Turkish Discourse Bank Browser, a tool developed for browsing the annotated annotated discourse relations in Middle East Technical University (METU) Turkish Discourse Bank (TDB) project is presented. The tool provides both a clear interface for browsing the annotated corpus and a wide range of search options to analyze the annotations.

pdf
Pair Annotation: Adaption of Pair Programming to Corpus Annotation
Isin Demirşahin | İhsan Yalcinkaya | Deniz Zeyrek
Proceedings of the Sixth Linguistic Annotation Workshop