This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
GézaNémeth
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Self-attention networks (SAN) have shown promising performance in various Natural Language Processing (NLP) scenarios, especially in machine translation. One of the main points of SANs is the strength of capturing long-range and multi-scale dependencies from the data. In this paper, we present a novel intent detection system which is based on a self-attention network and a Bi-LSTM. Our approach shows improvement by using a transformer model and deep averaging network-based universal sentence encoder compared to previous solutions. We evaluate the system on Snips, Smart Speaker, Smart Lights, and ATIS datasets by different evaluation metrics. The performance of the proposed model is compared with LSTM with the same datasets.
Currently available speech recognisers do not usually work well with elderly speech. This is because several characteristics of speech (e.g. fundamental frequency, jitter, shimmer and harmonic noise ratio) change with age and because the acoustic models used by speech recognisers are typically trained with speech collected from younger adults only. To develop speech-driven applications capable of successfully recognising elderly speech, this type of speech data is needed for training acoustic models from scratch or for adapting acoustic models trained with younger adults speech. However, the availability of suitable elderly speech corpora is still very limited. This paper describes an ongoing project to design, collect, transcribe and annotate large elderly speech corpora for four European languages: Portuguese, French, Hungarian and Polish. The Portuguese, French and Polish corpora contain read speech only, whereas the Hungarian corpus also contains spontaneous command and control type of speech. Depending on the language in question, the corpora contain 76 to 205 hours of speech collected from 328 to 986 speakers aged 60 and over. The final corpora will come with manually verified orthographic transcriptions, as well as annotations for filled pauses, noises and damaged words.
A Hungarian multimodal spontaneous expressive speech corpus was recorded following the methodology of a similar French corpus. The method relied on a Wizard of Oz scenario-based induction of varying affective states. The subjects were interacting with a supposedly voice-recognition driven computer application using simple command words. Audio and video signals were captured for the 7 recorded subjects. After the experiment, the subjects watched the video recording of their session and labelled the recorded corpus themselves, freely describing the evolution of their affective states. The obtained labels were later classified into one of the following broad emotional categories: satisfaction, dislike, stress, or other. A listening test was performed by 25 naïve listeners in order to validate the category labels originating from the self-labelling. For 52 of the 149 stimuli, listeners judgements of the emotional content were in agreement with the labels. The result of the listening test was compared with an earlier test validating a part of the French corpus. While the French test had a higher success ratio, validating the labels of 79 tested stimuli, out of the 193, the stimuli validated by the two tests can form the basis of cross linguistic comparison experiments.