Cem Bozsahin

Also published as: Cem Bozşahin, H. Cem Bozsahin

2016

pdf abs
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Elif Ahsen Acar | Deniz Zeyrek | Murathan Kurfalı | Cem Bozşahin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children’s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.

2014

pdf abs
Turkish Resources for Visual Word Recognition
Begüm Erten | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the words are used as the input, which are decomposed into their sub-syllabic components. The bigram frequency chains are constructed by the entire words’ onset, nucleus and coda patterns. Lexical statistics of stems and their syllabification are compiled by us from BOUN corpus of 490 million words. Use of these tools in some experiments is shown.

2013

pdf
Applicative Structures and Immediate Discourse in the Turkish Discourse Bank
Isin Demirşahin | Adnan Öztürel | Cem Bozşahin | Deniz Zeyrek
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

2010

pdf
Discourse Relation Configurations in Turkish and an Annotation Environment
Berfin Aktaş | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Fourth Linguistic Annotation Workshop

2009

Venues

law3
lrec2
coling1
acl1
cl1
show all...

ws1

Cem Bozsahin

2016

2014

2013

2010

2009

2002

1998

1996

Co-authors

Venues