Keisuke Shirai
2023
Towards Flow Graph Prediction of Open-Domain Procedural Texts
Keisuke Shirai
|
Hirotaka Kameko
|
Shinsuke Mori
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)
Machine comprehension of procedural texts is essential for reasoning about the steps and automating the procedures. However, this requires identifying entities within a text and resolving the relationships between the entities. Previous work focused on the cooking domain and proposed a framework to convert a recipe text into a flow graph (FG) representation. In this work, we propose a framework based on the recipe FG for flow graph prediction of open-domain procedural texts. To investigate flow graph prediction performance in non-cooking domains, we introduce the wikiHow-FG corpus from articles on wikiHow, a website of how-to instruction articles. In experiments, we consider using the existing recipe corpus and performing domain adaptation from the cooking to the target domain. Experimental results show that the domain adaptation models achieve higher performance than those trained only on the cooking or target domain data.
2022
Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows
Keisuke Shirai
|
Atsushi Hashimoto
|
Taichi Nishimura
|
Hirotaka Kameko
|
Shuhei Kurita
|
Yoshitaka Ushiku
|
Shinsuke Mori
Proceedings of the 29th International Conference on Computational Linguistics
We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn a cooking action result for each object in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph. We developed a web interface to reduce human annotation costs. The dataset allows us to try various applications, including multimodal information retrieval.
Image Description Dataset for Language Learners
Kento Tanaka
|
Taichi Nishimura
|
Hiroaki Nanjo
|
Keisuke Shirai
|
Hirotaka Kameko
|
Masatake Dantsuji
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We focus on image description and a corresponding assessment system for language learners. To achieve automatic assessment of image description, we construct a novel dataset, the Language Learner Image Description (LLID) dataset, which consists of images, their descriptions, and assessment annotations. Then, we propose a novel task of automatic error correction for image description, and we develop a baseline model that encodes multimodal information from a learner sentence with an image and accurately decodes a corrected sentence. Our experimental results show that the developed model can revise errors that cannot be revised without an image.
Search
Co-authors
- Hirotaka Kameko 3
- Shinsuke Mori 2
- Taichi Nishimura 2
- Atsushi Hashimoto 1
- Shuhei Kurita 1
- show all...