This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
TomohiroOhno
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In Japan, the number of single-person households, particularly among the elderly, is increasing. Consequently, opportunities for people to narrate are being reduced. To address this issue, conversational agents, e.g., communication robots and smart speakers, are expected to play the role of the listener. To realize these agents, this paper describes the collection of conversational responses by listeners that demonstrate attentive listening attitudes toward narrative speakers, and a method to annotate existing narrative speech with responsive utterances is proposed. To summarize, 148,962 responsive utterances by 11 listeners were collected in a narrative corpus comprising 13,234 utterance units. The collected responsive utterances were analyzed in terms of response frequency, diversity, coverage, and naturalness. These results demonstrated that diverse and natural responsive utterances were collected by the proposed method in an efficient and comprehensive manner. To demonstrate the practical use of the collected responsive utterances, an experiment was conducted, in which response generation timings were detected in narratives.
Nowadays, spoken dialogue agents such as communication robots and smart speakers listen to narratives of humans. In order for such an agent to be recognized as a listener of narratives and convey the attitude of attentive listening, it is necessary to generate responsive utterances. Moreover, responsive utterances can express empathy to narratives and showing an appropriate degree of empathy to narratives is significant for enhancing speaker’s motivation. The degree of empathy shown by responsive utterances is thought to depend on their type. However, the relation between responsive utterances and degrees of the empathy has not been explored yet. This paper describes the classification of responsive utterances based on the degree of empathy in order to explain that relation. In this research, responsive utterances are classified into five levels based on the effect of utterances and literature on attentive listening. Quantitative evaluations using 37,995 responsive utterances showed the appropriateness of the proposed classification.
In spoken dialogues, if a spoken dialogue system does not respond at all during users utterances, the user might feel uneasy because the user does not know whether or not the system has recognized the utterances. In particular, back-channel utterances, which the system outputs as voices such as yeah and uh huh in English have important roles for a driver in in-car speech dialogues because the driver does not look owards a listener while driving. This paper describes construction of a back-channel utterance corpus and its analysis to develop the system which can output back-channel utterances at the proper timing in the responsive in-car speech dialogue. First, we constructed the back-channel utterance corpus by integrating the back-channel utterances that four subjects provided for the drivers utterances in 60 dialogues in the CIAIR in-car speech dialogue corpus. Next, we analyzed the corpus and revealed the relation between back-channel utterance timings and information on bunsetsu, clause, pause and rate of speech. Based on the analysis, we examined the possibility of detecting back-channel utterance timings by machine learning technique. As the result of the experiment, we confirmed that our technique achieved as same detection capability as a human.
With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting research, such as the publication of an analytical large-scale corpus, has been prepared. For the future, it is necessary to make the corpora more practical toward realization of a simultaneous interpreting system. In this paper, we describe the construction of a bilingual corpus which can be used for simultaneous lecture interpreting research. Simultaneous lecture interpreting systems are required to recognize translation units in the middle of a sentence, and generate its translation at the proper timing. We constructed the bilingual lecture corpus by the following steps. First, we segmented sentences in the lecture data into semantically meaningful units for the simultaneous interpreting. And then, we assigned the translations to these units from the viewpoint of the simultaneous interpreting. In addition, we investigated the possibility of automatically detecting the simultaneous interpreting timing from our corpus.
Recently, monologue data such as lecture and commentary by professionals have been considered as valuable intellectual resources, and have been gathering attention. On the other hand, in order to use these monologue data effectively and efficiently, it is necessary for the monologue data not only just to be accumulated but also to be structured. This paper describes the construction of a Japanese spoken monologue corpus in which dependency structure is given to each utterance. Spontaneous monologue includes a lot of very long sentences composed of two or more clauses. In these sentences, there may exist the subject or the adverb common to multi-clauses, and it may be considered that the subject or adverb depend on multi-predicates. In order to give the dependency information in a real fashion, our research allows that a bunsetsu depends on multiple bunsetsus.