Raymond W. M. Ng


2018

pdf
Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition
Rory Beard | Ritwik Das | Raymond W. M. Ng | P. G. Keerthana Gopalakrishnan | Luka Eerens | Pawel Swietojanski | Ondrej Miksik
Proceedings of the 22nd Conference on Computational Natural Language Learning

Natural human communication is nuanced and inherently multi-modal. Humans possess specialised sensoria for processing vocal, visual, and linguistic, and para-linguistic information, but form an intricately fused percept of the multi-modal data stream to provide a holistic representation. Analysis of emotional content in face-to-face communication is a cognitive task to which humans are particularly attuned, given its sociological importance, and poses a difficult challenge for machine emulation due to the subtlety and expressive variability of cross-modal cues. Inspired by the empirical success of recent so-called End-To-End Memory Networks and related works, we propose an approach based on recursive multi-attention with a shared external memory updated over multiple gated iterations of analysis. We evaluate our model across several large multi-modal datasets and show that global contextualised memory with gated memory update can effectively achieve emotion recognition.

2015

pdf
Investigating Continuous Space Language Models for Machine Translation Quality Estimation
Kashif Shah | Raymond W. M. Ng | Fethi Bougares | Lucia Specia
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf
The USFD SLT system for IWSLT 2014
Raymond W. M. Ng | Mortaza Doulaty | Rama Doddipatla | Wilker Aziz | Kashif Shah | Oscar Saz | Madina Hasan | Ghada AlHaribi | Lucia Specia | Thomas Hain
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

The University of Sheffield (USFD) participated in the International Workshop for Spoken Language Translation (IWSLT) in 2014. In this paper, we will introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is achieved by two multi-pass deep neural network systems with adaptation and rescoring techniques. Machine translation (MT) is achieved by a phrase-based system. The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data. The USFD contrastive systems explore the integration of ASR and MT by using a quality estimation system to rescore the ASR outputs, optimising towards better translation. This gives a further 0.54 and 0.26 BLEU improvement respectively on the IWSLT 2012 and 2014 evaluation data.