Yongsheng Yang


A Very Large Scale Mandarin Chinese Broadcast Corpus for GALE Project
Yi Liu | Pascale Fung | Yongsheng Yang | Denise DiPersio | Meghan Glenn | Stephanie Strassel | Christopher Cieri
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present the design, collection, transcription and analysis of a Mandarin Chinese Broadcast Collection of over 3000 hours. The data was collected by Hong Kong University of Science and Technology (HKUST) in China on a cable TV and satellite transmission platform established in support of the DARPA Global Autonomous Language Exploitation (GALE) program. The collection includes broadcast news (BN) and broadcast conversation (BC) including talk shows, roundtable discussions, call-in shows, editorials and other conversational programs that focus on news and current events. HKUST also collects detailed information about all recorded programs. A subset of BC and BN recordings are manually transcribed with standard Chinese characters in UTF-8 encoding, using specific mark-ups for a small set of spontaneous and conversational speech phenomena. The collection is among the largest and first of its kind for Mandarin Chinese Broadcast speech, providing abundant and diverse samples for Mandarin speech recognition and other application-dependent tasks, such as spontaneous speech processing and recognition, topic detection, information retrieval, and speaker recognition. HKUST’s acoustic analysis of 500 hours of the speech and transcripts demonstrates the positive impact this data could have on system performance.


Learning bilingual semantic frames: shallow semantic parsing vs. semantic role projection
Pascale Fung | Zhaojun Wu | Yongsheng Yang | Dekai Wu
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers


A Maximum Entropy Approach to HowNet-Based Chinese Word Sense Disambiguation
Ping Wai Wong | Yongsheng Yang
COLING-02: SEMANET: Building and Using Semantic Networks

Boosting for Named Entity Recognition
Dekai Wu | Grace Ngai | Marine Carpuat | Jeppe Larsen | Yongsheng Yang
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)