This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
HideakiKikuchi
Also published as:
H. Kikuchi
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Recent developments in computer technology have allowed the construction and widespread application of large-scale speech corpora. To foster ease of data retrieval for people interested in utilising these speech corpora, we attempt to characterise speaking style across some of them. In this paper, we first introduce the 3 scales of speaking style proposed by Eskenazi in 1993. We then use morphological features extracted from speech transcriptions that have proven effective in style discrimination and author identification in the field of natural language processing to construct an estimation model of speaking style. More specifically, we randomly choose transcriptions from various speech corpora as text stimuli with which to conduct a rating experiment on speaking style perception; then, using the features extracted from those stimuli and the rating results, we construct an estimation model of speaking style by a multi-regression analysis. After the cross validation (leave-1-out), the results show that among the 3 scales of speaking style, the ratings of 2 scales can be estimated with high accuracies, which prove the effectiveness of our method in the estimation of speaking style.
This study was carried out to improve the quality of acted emotional speech. In the recent paradigm shift in speech collection techniques, methods for the collection of high-quality and spontaneous speech has been strongly focused on. However, such methods involve various constraints: such as the difficulty in controlling utterances and sound quality. Hence, our study daringly focuses on acted speech because of its high operability. In this paper, we propose a new method for speech collection by refining acting scripts. We compared the speech collected using our proposed method and that collected using an imitation of the legacy method that was implemented with traditional basic emotional words. The results show the advantage of our proposed method, i.e., the possibility of the generating high F0 fluctuations in acoustical expressions, which is one of the important features of the expressive speech, while ensuring that there is no decline in the naturalness and other psychological features.