Chen Yangdong


2021

pdf
Sketchy Scene Captioning: Learning Multi-Level Semantic Information from Sparse Visual Scene Cues
Zhou Lian | Chen Yangdong | Zhang Yuejie
Proceedings of the 20th Chinese National Conference on Computational Linguistics

To enrich the research about sketch modality a new task termed Sketchy Scene Captioning isproposed in this paper. This task aims to generate sentence-level and paragraph-level descrip-tions for a sketchy scene. The sentence-level description provides the salient semantics of asketchy scene while the paragraph-level description gives more details about the sketchy scene. Sketchy Scene Captioning can be viewed as an extension of sketch classification which can onlyprovide one class label for a sketch. To generate multi-level descriptions for a sketchy scene ischallenging because of the visual sparsity and ambiguity of the sketch modality. To achieve ourgoal we first contribute a sketchy scene captioning dataset to lay the foundation of this new task. The popular sequence learning scheme e.g. Long Short-Term Memory neural network with vi-sual attention mechanism is then adopted to recognize the objects in a sketchy scene and inferthe relations among the objects. In the experiments promising results have been achieved on the proposed dataset. We believe that this work will motivate further researches on the understanding of sketch modality and the numerous sketch-based applications in our daily life. The collected dataset is released at https://github.com/SketchysceneCaption/Dataset.
Search
Co-authors
Venues