2024
pdf
abs
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
Shun Inadumi
|
Seiya Kawano
|
Akishige Yuguchi
|
Yasutomo Kawanishi
|
Koichiro Yoshino
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Situated conversations, which refer to visual information as visual question answering (VQA), often contain ambiguities caused by reliance on directive information. This problem is exacerbated because some languages, such as Japanese, often omit subjective or objective terms. Such ambiguities in questions are often clarified by the contexts in conversational situations, such as joint attention with a user or user gaze information. In this study, we propose the Gaze-grounded VQA dataset (GazeVQA) that clarifies ambiguous questions using gaze information by focusing on a clarification process complemented by gaze information. We also propose a method that utilizes gaze target estimation results to improve the accuracy of GazeVQA tasks. Our experimental results showed that the proposed method improved the performance in some cases of a VQA system on GazeVQA and identified some typical problems of GazeVQA tasks that need to be improved.
pdf
abs
J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution
Nobuhiro Ueda
|
Hideko Habe
|
Akishige Yuguchi
|
Seiya Kawano
|
Yasutomo Kawanishi
|
Sadao Kurohashi
|
Koichiro Yoshino
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users. In real-world reference resolution, a system must ground the verbal information that appears in user interactions to the visual information observed in egocentric views. To this end, we propose a multimodal reference resolution task and construct a Japanese Conversation dataset for Real-world Reference Resolution (J-CRe3). Our dataset contains egocentric video and dialogue audio of real-world conversations between two people acting as a master and an assistant robot at home. The dataset is annotated with crossmodal tags between phrases in the utterances and the object bounding boxes in the video frames. These tags include indirect reference relations, such as predicate-argument structures and bridging references as well as direct reference relations. We also constructed an experimental model and clarified the challenges in multimodal reference resolution tasks.
2023
pdf
abs
Analysis of Style-Shifting on Social Media: Using Neural Language Model Conditioned by Social Meanings
Seiya Kawano
|
Shota Kanezaki
|
Angel Fernando Garcia Contreras
|
Akishige Yuguchi
|
Marie Katsurai
|
Koichiro Yoshino
Findings of the Association for Computational Linguistics: EMNLP 2023
In this paper, we propose a novel framework for evaluating style-shifting in social media conversations. Our proposed framework captures changes in an individual’s conversational style based on surprisals predicted by a personalized neural language model for individuals. Our personalized language model integrates not only the linguistic contents of conversations but also non-linguistic factors, such as social meanings, including group membership, personal attributes, and individual beliefs. We incorporate these factors directly or implicitly into our model, leveraging large, pre-trained language models and feature vectors derived from a relationship graph on social media. Compared to existing models, our personalized language model demonstrated superior performance in predicting an individual’s language in a test set. Furthermore, an analysis of style-shifting utilizing our proposed metric based on our personalized neural language model reveals a correlation between our metric and various conversation factors as well as human evaluation of style-shifting.
2022
pdf
abs
Pseudo Ambiguous and Clarifying Questions Based on Sentence Structures Toward Clarifying Question Answering System
Yuya Nakano
|
Seiya Kawano
|
Koichiro Yoshino
|
Katsuhito Sudoh
|
Satoshi Nakamura
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Question answering (QA) with disambiguation questions is essential for practical QA systems because user questions often do not contain information enough to find their answers. We call this task clarifying question answering, a task to find answers to ambiguous user questions by disambiguating their intents through interactions. There are two major problems in building a clarifying question answering system: data preparation of possible ambiguous questions and the generation of clarifying questions. In this paper, we tackle these problems by sentence generation methods using sentence structures. Ambiguous questions are generated by eliminating a part of a sentence considering the sentence structure. Clarifying the question generation method based on case frame dictionary and sentence structure is also proposed. Our experimental results verify that our pseudo ambiguous question generation successfully adds ambiguity to questions. Moreover, the proposed clarifying question generation recovers the performance drop by asking the user for missing information.
2019
pdf
abs
Neural Conversation Model Controllable by Given Dialogue Act Based on Adversarial Learning and Label-aware Objective
Seiya Kawano
|
Koichiro Yoshino
|
Satoshi Nakamura
Proceedings of the 12th International Conference on Natural Language Generation
Building a controllable neural conversation model (NCM) is an important task. In this paper, we focus on controlling the responses of NCMs by using dialogue act labels of responses as conditions. We introduce an adversarial learning framework for the task of generating conditional responses with a new objective to a discriminator, which explicitly distinguishes sentences by using labels. This change strongly encourages the generation of label-conditioned sentences. We compared the proposed method with some existing methods for generating conditional responses. The experimental results show that our proposed method has higher controllability for dialogue acts even though it has higher or comparable naturalness to existing methods.