Takuma Kato


Scene-Text Aware Image and Text Retrieval with Dual-Encoder
Shumpei Miyawaki | Taku Hasegawa | Kyosuke Nishida | Takuma Kato | Jun Suzuki
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

We tackle the tasks of image and text retrieval using a dual-encoder model in which images and text are encoded independently. This model has attracted attention as an approach that enables efficient offline inferences by connecting both vision and language in the same semantic space; however, whether an image encoder as part of a dual-encoder model can interpret scene-text (i.e., the textual information in images) is unclear.We propose pre-training methods that encourage a joint understanding of the scene-text and surrounding visual information.The experimental results demonstrate that our methods improve the retrieval performances of the dual-encoder models.


Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition
Takuma Kato | Kaori Abe | Hiroki Ouchi | Shumpei Miyawaki | Jun Suzuki | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

In general, the labels used in sequence labeling consist of different types of elements. For example, IOB-format entity labels, such as B-Person and I-Person, can be decomposed into span (B and I) and type information (Person). However, while most sequence labeling models do not consider such label components, the shared components across labels, such as Person, can be beneficial for label prediction. In this work, we propose to integrate label component information as embeddings into models. Through experiments on English and Japanese fine-grained named entity recognition, we demonstrate that the proposed method improves performance, especially for instances with low-frequency labels.