Helen Kaiyun Chen

Also published as: Helen Kai-yun Chen, Kai-Yun Chen, Kai-yun Chen

This study describes the construction of a manually annotated speech corpus that focuses on the sound profiles of repair/disfluency in Mandarin conversational interaction. Specifically, the paper focuses on how the tag set of prosodic profiles of the recycling repair culled from both audio-tapped and video-tapped, face-to-face Mandarin interaction are decided. By the methodology of both acoustic records and impressionistic judgements, 260 instances of Mandarin recycling repair are annotated with sound profiles including: pitch, duration, loudness, silence, and other observable prosodic cues (i.e. sound stretch and cut-offs). The study further introduces some possible applications of the current corpus, such as the implementation of the annotated data for analyzing the correlation between sound profiles of Mandarin repair and the interactional function of the repair. The goal of constructing the corpus is to facilitate an interdisciplinary study that concentrates on broadening the interactional linguistic theory by simultaneously paying close attention to the sound profiles emerged from interaction.

pdf abs
A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
Hongzhi Xu | Helen Kaiyun Chen | Chu-Ren Huang | Qin Lu | Dingxu Shi | Tin-Shing Chiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We adopt the corpus-informed approach to example sentence selections for the construction of a reference grammar. In the process, a database containing sentences that are carefully selected by linguistic experts including the full range of linguistic facts covered in an authoritative Chinese Reference Grammar is constructed and structured according to the reference grammar. A search engine system is developed to facilitate the process of finding the most typical examples the users need to study a linguistic problem or prove their hypotheses. The database can also be used as a training corpus by computational linguists to train models for Chinese word segmentation, POS tagging and sentence parsing.

2010

pdf bib
Evidentiality for Text Trustworthiness Detection
Qi Su | Chu-Ren Huang | Kai-yun Chen
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

pdf
Incorporate Credibility into Context for the Best Social Media Answers
Qi Su | Helen Kai-yun Chen | Chu-Ren Huang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2008

pdf abs
Annotating “tense” in a Tense-less Language
Nianwen Xue | Hua Zhong | Kai-Yun Chen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In the context of Natural Language Processing, annotation is about recovering implicit information that is useful for natural language applications. In this paper we describe a tense annotation task for Chinese - a language that does not have grammatical tense - that is designed to infer the temporal location of a situation in relation to the temporal deixis, the moment of speech. If successful, this would be a highly rewarding endeavor as it has application in many natural language systems. Our preliminary experiments show that while this is a very challenging annotation task for which high annotation consistency is very difficult but not impossible to achieve. We show that guidelines that provide a conceptually intuitive framework will be crucial to the success of this annotation effort.

Co-authors