Fine-tuning of Convolutional Neural Networks for the Recognition of Facial Expressions in Sign Language Video Samples

Neha Deshpande, Fabrizio Nunnari, Eleftherios Avramidis


Abstract
In this paper, we investigate the capability of convolutional neural networks to recognize in sign language video frames the six basic Ekman facial expressions for ‘fear’, ‘disgust’, ‘surprise’, ‘sadness’, ‘happiness’, ‘anger’ along with the ‘neutral’ class. Given the limited amount of annotated facial expression data for the sign language domain, we started from a model pre-trained on general-purpose facial expression datasets and we applied various machine learning techniques such as fine-tuning, data augmentation, class balancing, as well as image preprocessing to reach a better accuracy. The models were evaluated using K-fold cross-validation to get more accurate conclusions. It is experimentally demonstrated that fine-tuning a pre-trained model along with data augmentation by horizontally flipping images and image normalization, helps in providing the best accuracy on the sign language dataset. The best setting achieves satisfactory classification accuracy, comparable to state-of-the-art systems in generic facial expression recognition. Experiments were performed using different combinations of the above-mentioned techniques based on two different architectures, namely MobileNet and EfficientNet, and is deemed that both architectures seem equally suitable for the purpose of fine-tuning, whereas class balancing is discouraged.
Anthology ID:
2022.sltat-1.5
Volume:
Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, John C. McDonald, Dimitar Shterionov, Rosalee Wolfe
Venue:
SLTAT
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
29–38
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2022.sltat-1.5/
DOI:
Bibkey:
Cite (ACL):
Neha Deshpande, Fabrizio Nunnari, and Eleftherios Avramidis. 2022. Fine-tuning of Convolutional Neural Networks for the Recognition of Facial Expressions in Sign Language Video Samples. In Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, pages 29–38, Marseille, France. European Language Resources Association.
Cite (Informal):
Fine-tuning of Convolutional Neural Networks for the Recognition of Facial Expressions in Sign Language Video Samples (Deshpande et al., SLTAT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2022.sltat-1.5.pdf