Nick Campbell

2018

pdf abs
Just Talking - Modelling Casual Conversation
Emer Gilmartin | Christian Saam | Carl Vogel | Nick Campbell | Vincent Wade
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Casual conversation has become a focus for artificial dialogue applications. Such talk is ubiquitous and its structure differs from that found in the task-based interactions which have been the focus of dialogue system design for many years. It is unlikely that such conversations can be modelled as an extension of task-based talk. We review theories of casual conversation, report on our studies of the structure of casual dialogue, and outline challenges we see for the development of spoken dialog systems capable of carrying on casual friendly conversation in addition to performing well-defined tasks.

pdf
Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations
Emer Gilmartin | Carl Vogel | Nick Campbell
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Development of an Annotated Multimodal Dataset for the Investigation of Classification and Summarisation of Presentations using High-Level Paralinguistic Features
Keith Curtis | Nick Campbell | Gareth Jones
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Akira Hayakawa | Carl Vogel | Saturnino Luz | Nick Campbell
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
Iacer Calixto | Qun Liu | Nick Campbell
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms as it generates words in the target language. We find that our model can efficiently exploit not just back-translated in-domain multi-modal data but also large general-domain text-only MT corpora. We also report state-of-the-art results on the Multi30k data set.

2016

pdf abs
The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
Akira Hayakawa | Saturnino Luz | Loredana Cerrato | Nick Campbell
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the multimodal Interlingual Map Task Corpus (ILMT-s2s corpus) collected at Trinity College Dublin, and discuss some of the issues related to the collection and analysis of the data. The corpus design is inspired by the HCRC Map Task Corpus which was initially designed to support the investigation of linguistic phenomena, and has been the focus of a variety of studies of communicative behaviour. The simplicity of the task, and the complexity of phenomena it can elicit, make the map task an ideal object of study. Although there are studies that used replications of the map task to investigate communication in computer mediated tasks, this ILMT-s2s corpus is, to the best of our knowledge, the first investigation of communicative behaviour in the presence of three additional “filters”: Automatic Speech Recognition (ASR), Machine Translation (MT) and Text To Speech (TTS) synthesis, where the instruction giver and the instruction follower speak different languages. This paper details the data collection setup and completed annotation of the ILMT-s2s corpus, and outlines preliminary results obtained from the data.

pdf abs
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
Nick Campbell
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper reports the preservation of an old speech synthesis website as a corpus. CHATR was a revolutionary technique developed in the mid nineties for concatenative speech synthesis. The method has since become the standard for high quality speech output by computer although much of the current research is devoted to parametric or hybrid methods that employ smaller amounts of data and can be more easily tunable to individual voices. The system was first reported in 1994 and the website was functional in 1996. The ATR labs where this system was invented no longer exist, but the website has been preserved as a corpus containing 1537 samples of synthesised speech from that period (118 MB in aiff format) in 211 pages under various finely interrelated themes The corpus can be accessed from www.speech-data.jp as well as www.tcd-fastnet.com, where the original code and samples are now being maintained.

pdf abs
Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
Emer Gilmartin | Nick Campbell
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Casual multiparty conversation is an understudied but very common genre of spoken interaction, whose analysis presents a number of challenges in terms of data scarcity and annotation. We describe the annotation process used on the d64 and DANS multimodal corpora of multiparty casual talk, which have been manually segmented, transcribed, annotated for laughter and disfluencies, and aligned using the Penn Aligner. We also describe a visualization tool, STAVE, developed during the annotation process, which allows long stretches of talk or indeed entire conversations to be viewed, aiding preliminary identification of features and patterns worthy of analysis. It is hoped that this tool will be of use to other researchers working in this field.

2014

pdf abs
The D-ANS corpus: the Dublin-Autonomous Nervous System corpus of biosignal and multimodal recordings of conversational speech
Shannon Hennig | Ryad Chellali | Nick Campbell
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Biosignals, such as electrodermal activity (EDA) and heart rate, are increasingly being considered as potential data sources to provide information about the temporal fluctuations in affective experience during human interaction. This paper describes an English-speaking, multiple session corpus of small groups of people engaged in informal, unscripted conversation while wearing wireless, wrist-based EDA sensors. Additionally, one participant per recording session wore a heart rate monitor. This corpus was collected in order to observe potential interactions between various social and communicative phenomena and the temporal dynamics of the recorded biosignals. Here we describe the communicative context, technical set-up, synchronization process, and challenges in collecting and utilizing such data. We describe the segmentation and annotations to date, including laughter annotations, and how the research community can access and collaborate on this corpus now and in the future. We believe this corpus is particularly relevant to researchers interested in unscripted social conversation as well as to researchers with a specific interest in observing the dynamics of biosignals during informal social conversation rich with examples of laughter, conversational turn-taking, and non-task-based interaction.

2013

pdf
Laugher and Topic Transition in Multiparty Conversation
Emer Gilmartin | Francesca Bonin | Carl Vogel | Nick Campbell
Proceedings of the SIGDIAL 2013 Conference

2012

pdf abs
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues
Jing Guang Han | Emer Gilmartin | Celine De Looze | Brian Vaughan | Nick Campbell
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents methodologies and tools for language resource (LR) construction. It describes a database of interactive speech collected over a three-month period at the Science Gallery in Dublin, where visitors could take part in a conversation with a robot. The system collected samples of informal, chatty dialogue -- normally difficult to capture under laboratory conditions for human-human dialogue, and particularly so for human-machine interaction. The conversations were based on a script followed by the robot consisting largely of social chat with some task-based elements. The interactions were audio-visually recorded using several cameras together with microphones. As part of the conversation the participants were asked to sign a consent form giving permission to use their data for human-machine interaction research. The multimodal corpus will be made available to interested researchers and the technology developed during the three-month exhibition is being extended for use in education and assisted-living applications.

pdf abs
An audiovisual political speech analysis incorporating eye-tracking and perception data
Stefan Scherer | Georg Layher | John Kane | Heiko Neumann | Nick Campbell
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We investigate the influence of audiovisual features on the perception of speaking style and performance of politicians, utilizing a large publicly available dataset of German parliament recordings. We conduct a human perception experiment involving eye-tracker data to evaluate human ratings as well as behavior in two separate conditions, i.e. audiovisual and video only. The ratings are evaluated on a five dimensional scale comprising measures of insecurity, monotony, expressiveness, persuasiveness, and overall performance. Further, they are statistically analyzed and put into context in a multimodal feature analysis, involving measures of prosody, voice quality and motion energy. The analysis reveals several statistically significant features, such as pause timing, voice quality measures and motion energy, that highly positively or negatively correlate with certain human ratings of speaking style. Additionally, we compare the gaze behavior of the human subjects to evaluate saliency regions in the multimodal and visual only conditions. The eye-tracking analysis reveals significant changes in the gaze behavior of the human subjects; participants reduce their focus of attention in the audiovisual condition mainly to the region of the face of the politician and scan the upper body, including hands and arms, in the video only condition.

pdf
Vers un mesure automatique de l’adaptation prosodique en interaction conversationnelle (Automatic measurement of prosodic accommodation in conversational interaction) [in French]
Céline De Looze | Stefan Scherer | Brian Vaughan | Nick Campbell
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

2010

pdf abs
A Software Toolkit for Viewing Annotated Multimodal Data Interactively over the Web
Nick Campbell | Akiko Tabata
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a software toolkit for the interactive display and analysis of automatically extracted or manually derived annotation features of visual and audio data. It has been extensively tested with material collected as part of the FreeTalk Multimodal Conversation Corpus. Both the corpus and the software are available for download from sites in Europe and Japan. The corpus consists of several hours of video and audio recordings from a variety of capture devices, and includes subjective annotations of the content, along with derived data obtained from image processing. Because of the large size of the corpus, it is unrealistic to expect researchers to download all the material before deciding whether it will be useful to them in their research. We have therefore devised a means for interactive browsing of the content and for viewing at different levels of granularity. This has resulted in a simple set of tools that can be added to any website to allow similar browsing of audio- video recordings and their related data and annotations.

2008

pdf abs
Tools & Resources for Visualising Conversational-Speech Interaction
Nick Campbell
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes tools and techniques for accessing large quantities of speech data and for the visualisation of discourse interactions and events at levels above that of linguistic content. We are working with large quantities of dialogue speech including business meetings, friendly discourse, and telephone conversations, and have produced web-based tools for the visualisation of non-verbal and paralinguistic features of the speech data. In essence, they provide higher-level displays so that specific sections of speech, text, or other annotation can be accessed by the researcher and provide an interactive interface to the large amount of data through an Archive Browser.

2007

pdf bib
Differences in the Speaking Styles of a Japanese Male According to Interlocutor; Showing the Effects of Affect in Conversational Speech
Nick Campbell
International Journal of Computational Linguistics & Chinese Language Processing, Volume 12, Number 1, March 2007: Special Issue on Affective Speech Processing

2006

pdf abs
Multimedia Database of Meetings and Informal Interactions for Tracking Participant Involvement and Discourse Flow
Nick Campbell | Toshiyuki Sadanobu | Masataka Imura | Naoto Iwahashi | Suzuki Noriko | Damien Douxchamps
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

At ATR, we are collecting and analysing meetings data using a table-top sensor device consisting of a small 360-degree camera surrounded by an array of high-quality directional microphones. This equipment provides a stream of information about the audio and visual events of the meeting which is then processed to form a representation of the verbal and non-verbal interpersonal activity, or discourse flow, during the meeting. This paper describes the resulting corpus of speech and video data which is being collected for the abovere search. It currently includes data from 12 monthly sessions, comprising 71 video and 33 audio modules. Collection is continuingmonthly and is scheduled to include another ten sessions.

Nick Campbell

2018

2017

2016

2014

2013

2012

2010

2008

2007

2006

2004

2002

2000

Co-authors

Venues