Cem Bozsahin

Also published as: Cem Bozşahin, H. Cem Bozsahin


2016

pdf
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Elif Ahsen Acar | Deniz Zeyrek | Murathan Kurfalı | Cem Bozşahin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children’s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.

2014

pdf
Turkish Resources for Visual Word Recognition
Begüm Erten | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the words are used as the input, which are decomposed into their sub-syllabic components. The bigram frequency chains are constructed by the entire words’ onset, nucleus and coda patterns. Lexical statistics of stems and their syllabification are compiled by us from BOUN corpus of 490 million words. Use of these tools in some experiments is shown.

2013

pdf
Applicative Structures and Immediate Discourse in the Turkish Discourse Bank
Isin Demirşahin | Adnan Öztürel | Cem Bozşahin | Deniz Zeyrek
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

2010

pdf
Discourse Relation Configurations in Turkish and an Annotation Environment
Berfin Aktaş | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf
Annotating Subordinators in the Turkish Discourse Bank
Deniz Zeyrek | Umit Deniz Turan | Cem Bozsahin | Ruket Cakici | Ayisigi B. Sevdik-Calli | Isin Demirsahin | Berfin Aktas | İhsan Yalcinkaya | Hale Ogel
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2002

pdf bib
The Combinatory Morphemic Lexicon
Cem Bozsahin
Computational Linguistics, Volume 28, Number 2, June 2002

1998

pdf
Deriving the Predicate-Argument Structure for a Free Word Order Language
Cem Bozsahin
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf
Deriving the Predicate-Argument Structure for a Free Word Order Language
Cem Bozsahin
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1996

pdf
Morphological Productivity in the Lexicon
Onur T. Sehitoglu | H. Cem Bozsahin
Breadth and Depth of Semantic Lexicons