2022
pdf
abs
Design and Evaluation of the Corpus of Everyday Japanese Conversation
Hanae Koiso
|
Haruka Amatani
|
Yasuharu Den
|
Yuriko Iseki
|
Yuichi Ishimoto
|
Wakako Kashino
|
Yoshiko Kawabata
|
Ken’ya Nishikawa
|
Yayoi Tanaka
|
Yasuyuki Usuda
|
Yuka Watanabe
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We have constructed the Corpus of Everyday Japanese Conversation (CEJC) and published it in March 2022. The CEJC is designed to contain various kinds of everyday conversations in a balanced manner to capture their diversity. The CEJC features not only audio but also video data to facilitate precise understanding of the mechanism of real-life social behavior. The publication of a large-scale corpus of everyday conversations that includes video data is a new approach. The CEJC contains 200 hours of speech, 577 conversations, about 2.4 million words, and a total of 1675 conversants. In this paper, we present an overview of the corpus, including the recording method and devices, structure of the corpus, formats of video and audio files, transcription, and annotations. We then report some results of the evaluation of the CEJC in terms of conversant and conversation attributes. We show that the CEJC includes a good balance of adult conversants in terms of gender and age, as well as a variety of conversations in terms of conversation forms, places, activities, and numbers of conversants.
pdf
abs
Cognitive States and Types of Nods
Taiga Mori
|
Kristiina Jokinen
|
Yasuharu Den
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind
In this paper we will study how different types of nods are related to the cognitive states of the listener. The distinction is made between nods with movement starting upwards (up-nods) and nods with movement starting downwards (down-nods) as well as between single or repetitive nods. The data is from Japanese multiparty conversations, and the results accord with the previous findings indicating that up-nods are related to the change in the listener’s cognitive state after hearing the partner’s contribution, while down-nods convey the meaning that the listener’s cognitive state is not changed.
2020
pdf
bib
abs
Analysis of Body Behaviours in Human-Human and Human-Robot Interactions
Taiga Mori
|
Kristiina Jokinen
|
Yasuharu Den
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)
We conducted preliminary comparison of human-robot (HR) interaction with human-human (HH) interaction conducted in English and in Japanese. As the result, body gestures increased in HR, while hand and head gestures decreased in HR. Concerning hand gesture, they were composed of more diverse and complex forms, trajectories and functions in HH than in HR. Moreover, English speakers produced 6 times more hand gestures than Japanese speakers in HH. Regarding head gesture, even though there was no difference in the frequency of head gestures between English speakers and Japanese speakers in HH, Japanese speakers produced slightly more nodding during the robot’s speaking than English speakers in HR. Furthermore, positions of nod were different depending on the language. Concerning body gesture, participants produced body gestures mostly to regulate appropriate distance with the robot in HR. Additionally, English speakers produced slightly more body gestures than Japanese speakers.
pdf
abs
A Conversation-Analytic Annotation of Turn-Taking Behavior in Japanese Multi-Party Conversation and its Preliminary Analysis
Mika Enomoto
|
Yasuharu Den
|
Yuichi Ishimoto
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this study, we propose a conversation-analytic annotation scheme for turn-taking behavior in multi-party conversations. The annotation scheme is motivated by a proposal of a proper model of turn-taking incorporating various ideas developed in the literature of conversation analysis. Our annotation consists of two sets of tags: the beginning and the ending type of the utterance. Focusing on the ending-type tags, in some cases combined with the beginning-type tags, we emphasize the importance of the distinction among four selection types: i) selecting other participant as next speaker, ii) not selecting next speaker but followed by a switch of the speakership, iii) not selecting next speaker and followed by a continuation of the speakership, and iv)being inside a multi-unit turn. Based on the annotation of Japanese multi-party conversations, we analyze how syntactic and prosodic features of utterances vary across the four selection types. The results show that the above four-way distinction is essential to account for the distributions of the syntactic and prosodic features, suggesting the insufficiency of previous turn-taking models that do not consider the distinction between i) and ii) or between ii) or iii).
2018
pdf
Construction of the Corpus of Everyday Japanese Conversation: An Interim Report
Hanae Koiso
|
Yasuharu Den
|
Yuriko Iseki
|
Wakako Kashino
|
Yoshiko Kawabata
|
Ken’ya Nishikawa
|
Yayoi Tanaka
|
Yasuyuki Usuda
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
abs
Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
Hanae Koiso
|
Tomoyuki Tsuchiya
|
Ryoko Watanabe
|
Daisuke Yokomori
|
Masao Aizawa
|
Yasuharu Den
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In 2016, we set about building a large-scale corpus of everyday Japanese conversation―a collection of conversations embedded in naturally occurring activities in daily life. We will collect more than 200 hours of recordings over six years,publishing the corpus in 2022. To construct such a huge corpus, we have conducted a pilot project, one of whose purposes is to establish a corpus design for collecting various kinds of everyday conversations in a balanced manner. For this purpose, we conducted a survey of everyday conversational behavior, with about 250 adults, in order to reveal how diverse our everyday conversational behavior is and to build an empirical foundation for corpus design. The questionnaire included when, where, how long,with whom, and in what kind of activity informants were engaged in conversations. We found that ordinary conversations show the following tendencies: i) they mainly consist of chats, business talks, and consultations; ii) in general, the number of participants is small and the duration of the conversation is short; iii) many conversations are conducted in private places such as homes, as well as in public places such as offices and schools; and iv) some questionnaire items are related to each other. This paper describes an overview of this survey study, and then discusses how to design a large-scale corpus of everyday Japanese conversation on this basis.
2014
pdf
abs
Design and development of an RDB version of the Corpus of Spontaneous Japanese
Hanae Koiso
|
Yasuharu Den
|
Ken’ya Nishikawa
|
Kikuo Maekawa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we describe the design and development of a new version of the Corpus of Spontaneous Japanese (CSJ), which is a large-scale spoken corpus released in 2004. CSJ contains various annotations that are represented in XML format (CSJ-XML). CSJ-XML, however, is very complicated and suffers from some problems. To overcome this problem, we have developed and released, in 2013, a relational database version of CSJ (CSJ-RDB). CSJ-RDB is based on an extension of the segment and link-based annotation scheme, which we adapted to handle multi-channel and multi-modal streams. Because this scheme adopts a stand-off framework, CSJ-RDB can represent three hierarchical structures at the same time: inter-pausal-unit-top, clause-top, and intonational-phrase-top. CSJ-RDB consists of five different types of tables: segment, unaligned-segment, link, relation, and meta-information tables. The database was automatically constructed from annotation files extracted from CSJ-XML by using general-purpose corpus construction tools. CSJ-RDB enables us to easily and efficiently conduct complex searches required for corpus-based studies of spoken language.
pdf
abs
Towards Automatic Transformation between Different Transcription Conventions: Prediction of Intonation Markers from Linguistic and Acoustic Features
Yuichi Ishimoto
|
Tomoyuki Tsuchiya
|
Hanae Koiso
|
Yasuharu Den
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Because of the tremendous effort required for recording and transcription, large-scale spoken language corpora have been hardly developed in Japanese, with a notable exception of the Corpus of Spontaneous Japanese (CSJ). Various research groups have individually developed conversation corpora in Japanese, but these corpora are transcribed by different conventions and have few annotations in common, and some of them lack fundamental annotations, which are prerequisites for conversation research. To solve this situation by sharing existing conversation corpora that cover diverse styles and settings, we have tried to automatically transform a transcription made by one convention into that made by another convention. Using a conversation corpus transcribed in both the Conversation-Analysis-style (CA-style) and CSJ-style, we analyzed the correspondence between CA’s ‘intonation markers’ and CSJ’s ‘tone labels,’ and constructed a statistical model that converts tone labels into intonation markers with reference to linguistic and acoustic features of the speech. The result showed that there is considerable variance in intonation marking even between trained transcribers. The model predicted with 85% accuracy the presence of the intonation markers, and classified the types of the markers with 72% accuracy.
pdf
abs
Japanese conversation corpus for training and evaluation of backchannel prediction model.
Hiroaki Noguchi
|
Yasuhiro Katagiri
|
Yasuharu Den
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we propose an experimental method for building a specialized corpus for training and evaluating backchannel prediction models of spoken dialogue. To develop a backchannel prediction model using a machine learning technique, it is necessary to discriminate between the timings of the interlocutor s speech when more listeners commonly respond with backchannels and the timings when fewer listeners do so. The proposed corpus indicates the normative timings for backchannels in each speech with millisecond accuracy. In the proposed method, we first extracted each speech comprising a single turn from recorded conversation. Second, we presented these speeches as stimuli to 89 participants and asked them to respond by key hitting whenever they thought it appropriate to respond with a backchannel. In this way, we collected 28983 responses. Third, we applied the Gaussian mixture model to the temporal distribution of the responses and estimated the center of Gaussian distribution, that is, the backchannel relevance place (BRP), in each case. Finally, we synthesized 10 pairs of stereo speech stimuli and asked 19 participants to rate each on a 7-point scale of naturalness. The results show that backchannels inserted at BRPs were significantly higher than those in the original condition.
2012
pdf
abs
Annotation of response tokens and their triggering expressions in Japanese multi-party conversations
Yasuharu Den
|
Hanae Koiso
|
Katsuya Takanashi
|
Nao Yoshida
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In this paper, we propose a new scheme for annotating response tokens (RTs) and their triggering expressions in Japanese multi-party conversations. In the proposed scheme, RTs are first identified and classified according to their forms, and then sub-classified according to their sequential positions in the discourse. To deeply study the contexts in which RTs are used, the scheme also provides procedures for annotating triggering expressions, which are considered to trigger the listener's production of RTs. RTs are classified according to whether or not there is a particular object or proposition in the speaker's turn for which the listener shows a positive or aligned stance. Triggering expressions are then identified in the speaker's turn; they include surprising facts and other newsworthy things, opinions and assessments, focus of a response to a question or repair initiation, keywords in narratives, and embedded propositions quoted from other's statement or thought, which are to be agreed upon, assessed, or noticed. As an illustrative application of our scheme, we present a preliminary analysis on the distribution of the latency of the listener's response to the triggering expression, showing how it differs according to RT's forms and positions.
pdf
abs
Annotation of anaphoric relations and topic continuity in Japanese conversation
Natsuko Nakagawa
|
Yasuharu Den
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper proposes a basic scheme for annotating anaphoric relations in Japanese conversations. More specifically, we propose methods of (i) dividing discourse segments into meaningful units, (ii) identifying zero pronouns and other overt anaphors, (iii) classifying zero pronouns, and (iv) identifying anaphoric relations. We discuss various kinds of problems involved in the annotation mainly caused by on-line processing of discourse and/or interactions between the participants. These problems do not arise in annotating written languages. This paper also proposes a method to compute topic continuity based on anaphoric relations. The topic continuity involves the information status of the noun in question (given, accessible, and new) and persistence (whether the noun is mentioned multiple times or not). We show that the topic continuity correlates with short-utterance units, which are determined prosodically through the previous annotations; nouns of high topic continuity tend to be prosodically separated from the predicates. This result indicates the validity of our annotations of anaphoric relations and topic continuity and the usefulness for further studies on discourse and interaction.
pdf
abs
UniDic for Early Middle Japanese: a Dictionary for Morphological Analysis of Classical Japanese
Toshinobu Ogiso
|
Mamoru Komachi
|
Yasuharu Den
|
Yuji Matsumoto
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In order to construct an annotated diachronic corpus of Japanese, we propose to create a new dictionary for morphological analysis of Early Middle Japanese (Classical Japanese) based on UniDic, a dictionary for Contemporary Japanese. Differences between the Early Middle Japanese and Contemporary Japanese, which prevent a naïve adaptation of UniDic to Early Middle Japanese, are found at the levels of lexicon, morphology, grammar, orthography and pronunciation. In order to overcome these problems, we extended dictionary entries and created a training corpus of Early Middle Japanese to adapt UniDic for Contemporary Japanese to Early Middle Japanese. Experimental results show that the proposed UniDic-EMJ, a new dictionary for Early Middle Japanese, achieves as high accuracy (97%) as needed for the linguistic research on lexicon and grammar in Japanese classical text analysis.
2010
pdf
abs
Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese
Kikuo Maekawa
|
Makoto Yamazaki
|
Takehiko Maruyama
|
Masaya Yamaguchi
|
Hideki Ogura
|
Wakako Kashino
|
Toshinobu Ogiso
|
Hanae Koiso
|
Yasuharu Den
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Compilation of a 100 million words balanced corpus called the Balanced Corpus of Contemporary Written Japanese (or BCCWJ) is underway at the National Institute for Japanese Language and Linguistics. The corpus covers a wide range of text genres including books, magazines, newspapers, governmental white papers, textbooks, minutes of the National Diet, internet text (bulletin board and blogs) and so forth, and when possible, samples are drawn from the rigidly defined statistical populations by means of random sampling. All texts are dually POS-analyzed based upon two different, but mutually related, definitions of word. Currently, more than 90 million words have been sampled and XML annotated with respect to text-structure and lexical and character information. A preliminary linear discriminant analysis of text genres using the data of POS frequencies and sentence length revealed it was possible to classify the text genres with a correct identification rate of 88% as far as the samples of books, newspapers, whitepapers, and internet bulletin boards are concerned. When the samples of blogs were included in this data set, however, the identification rate went down to 68%, suggesting the considerable variance of the blog texts in terms of the textual register and style.
pdf
abs
Two-level Annotation of Utterance-units in Japanese Dialogs: An Empirically Emerged Scheme
Yasuharu Den
|
Hanae Koiso
|
Takehiko Maruyama
|
Kikuo Maekawa
|
Katsuya Takanashi
|
Mika Enomoto
|
Nao Yoshida
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper, we propose a scheme for annotating utterance-level units in Japanese dialogs, which emerged from an analysis of the interrelationship among four schemes, i) inter-pausal units, ii) intonation units, iii) clause units, and iv) pragmatic units. The associations among the labels of these four units were illustrated by multiple correspondence analysis and hierarchical cluster analysis. Based on these results, we prescribe utterance-unit identification rules, which identify two sorts of utterance-units with different granularities: short and long utterance-units. Short utterance-units are identified by acoustic and prosodic disjuncture, and they are considered to constitute units of speaker's planning and hearer's understanding. Long utterance-units, on the other hand, are recognized by syntactic and pragmatic disjuncture, and they are regarded as units of interaction. We explore some characteristics of these utterance-units, focusing particularly on unit duration and syntactic property, other participants' responses, and mismatch between the two-levels. We also discuss how our two-level utterance-units are useful in analyzing cognitive and communicative aspects of spoken dialogs.
2008
pdf
abs
Word-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and its Application
Kiyotaka Uchimoto
|
Yasuharu Den
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In Japanese, the syntactic structure of a sentence is generally represented by the relationship between phrasal units, bunsetsus in Japanese, based on a dependency grammar. In many cases, the syntactic structure of a bunsetsu is not considered in syntactic structure annotation. This paper gives the criteria and definitions of dependency relationships between words in a bunsetsu and their applications. The target corpus for the word-level dependency annotation is a large spontaneous Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ). One application of word-level dependency relationships is to find basic units for constructing accent phrases.
pdf
abs
A Proper Approach to Japanese Morphological Analysis: Dictionary, Model, and Evaluation
Yasuharu Den
|
Junpei Nakamura
|
Toshinobu Ogiso
|
Hideki Ogura
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper, we discuss lemma identification in Japanese morphological analysis, which is crucial for a proper formulation of morphological analysis that benefits not only NLP researchers but also corpus linguists. Since Japanese words often have variation in orthography and the vocabulary of Japanese consists of words of several different origins, it sometimes happens that more than one writing form corresponds to the same lemma and that a single writing form corresponds to two or more lemmas with different readings and/or meanings. The mapping from a writing form onto a lemma is important in linguistic analysis of corpora. The current study focuses on disambiguation of heteronyms, words with the same writing form but with different word forms. To resolve heteronym ambiguity, we make use of goshu information, the classification of words based on their origin. Founded on the fact that words of some goshu classes are more likely to combine into compound words than words of other classes, we employ a statistical model based on CRFs using goshu information. Experimental results show that the use of goshu information considerably improves the performance of heteronym disambiguation and lemma identification, suggesting that goshu information solves the lemma identification task very effectively.
pdf
Implicit Proposal Filtering in Multi-Party Consensus-Building Conversations
Yasuhiro Katagiri
|
Yosuke Matsusaka
|
Yasuharu Den
|
Mika Enomoto
|
Masato Ishizaki
|
Katsuya Takanashi
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
2002
pdf
Use of XML and Relational Databases for Consistent Development and Maintenance of Lexicons and Annotated Corpora
Masayuki Asahara
|
Ryuichi Yoneda
|
Akiko Yamashita
|
Yasuharu Den
|
Yuji Matsumoto
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
1994
pdf
Generalized Chart Algorithm: An Efficient Procedure for Cost-Based Abduction
Yasuharu Den
32nd Annual Meeting of the Association for Computational Linguistics