Developing Corpus of Japanese Classroom Lecture Speech Contents

Masatoshi Tsuchiya, Satoru Kogure, Hiromitsu Nishizaki, Kengo Ohta, Seiichi Nakagawa


Abstract
This paper explains our developing Corpus of Japanese classroom Lecture speech Contents (henceforth, denoted as CJLC). Increasing e-Learning contents demand a sophisticated interactive browsing system for themselves, however, existing tools do not satisfy such a requirement. Many researches including large vocabulary continuous speech recognition and extraction of important sentences against lecture contents are necessary in order to realize the above system. CJLC is designed as their fundamental basis, and consists of speech, transcriptions, and slides that were collected in real university classroom lectures. This paper also explains the difference about disfluency acts between classroom lectures and academic presentations.
Anthology ID:
L08-1506
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/524_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Masatoshi Tsuchiya, Satoru Kogure, Hiromitsu Nishizaki, Kengo Ohta, and Seiichi Nakagawa. 2008. Developing Corpus of Japanese Classroom Lecture Speech Contents. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Developing Corpus of Japanese Classroom Lecture Speech Contents (Tsuchiya et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/524_paper.pdf