@inproceedings{lian-etal-2021-sketchy,
    title = "Sketchy Scene Captioning: Learning Multi-Level Semantic Information from Sparse Visual Scene Cues",
    author = "Lian, Zhou  and
      Yangdong, Chen  and
      Yuejie, Zhang",
    editor = "Li, Sheng  and
      Sun, Maosong  and
      Liu, Yang  and
      Wu, Hua  and
      Liu, Kang  and
      Che, Wanxiang  and
      He, Shizhu  and
      Rao, Gaoqi",
    booktitle = "Proceedings of the 20th Chinese National Conference on Computational Linguistics",
    month = aug,
    year = "2021",
    address = "Huhhot, China",
    publisher = "Chinese Information Processing Society of China",
    url = "https://preview.aclanthology.org/ingest-emnlp/2021.ccl-1.104/",
    pages = "1167--1177",
    language = "eng",
    abstract = "To enrich the research about sketch modality a new task termed Sketchy Scene Captioning isproposed in this paper. This task aims to generate sentence-level and paragraph-level descrip-tions for a sketchy scene. The sentence-level description provides the salient semantics of asketchy scene while the paragraph-level description gives more details about the sketchy scene. Sketchy Scene Captioning can be viewed as an extension of sketch classification which can onlyprovide one class label for a sketch. To generate multi-level descriptions for a sketchy scene ischallenging because of the visual sparsity and ambiguity of the sketch modality. To achieve ourgoal we first contribute a sketchy scene captioning dataset to lay the foundation of this new task. The popular sequence learning scheme e.g. Long Short-Term Memory neural network with vi-sual attention mechanism is then adopted to recognize the objects in a sketchy scene and inferthe relations among the objects. In the experiments promising results have been achieved on the proposed dataset. We believe that this work will motivate further researches on the understanding of sketch modality and the numerous sketch-based applications in our daily life. The collected dataset is released at \url{https://github.com/SketchysceneCaption/Dataset}."
}Markdown (Informal)
[Sketchy Scene Captioning: Learning Multi-Level Semantic Information from Sparse Visual Scene Cues](https://preview.aclanthology.org/ingest-emnlp/2021.ccl-1.104/) (Lian et al., CCL 2021)
ACL