Katsumasa Yoshikawa
2026
Evaluating the Effect of Question Wording Variations on Answer Consistency in Large Language Models
Junya Takayama | Masaya Ohagi | Tomoya Mizumoto | Katsumasa Yoshikawa
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Junya Takayama | Masaya Ohagi | Tomoya Mizumoto | Katsumasa Yoshikawa
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Large Language Models (LLMs) sometimes generate inconsistent answers when asked semantically equivalent questions expressed with different wordings. Such inconsistency may lead to decreased task performance or excessive agreement with users. This study investigates how question wording influences the answer consistencies of LLMs, focusing on binary Yes/No questions. We design four types of paraphrasing patterns, namely synonym substitution, antonym substitution, addition of agreement-seeking expressions, and strengthened agreement-seeking expressions, and evaluate their impact on model outputs. Experiments with multiple open-source and commercial LLMs show that many models become more likely to answer "Yes" when agreement-seeking expressions are included, and they are particularly vulnerable to antonym substitutions. Our analysis further suggests that some of these tendencies are already present in pretrained models and are not fully removed by post-training. We also provide insights into which factors are likely (or unlikely) to contribute to improving consistency. By providing a systematic evaluation framework, this work highlights the necessity of accounting for wording-induced biases in the development and deployment of LLMs.
Persona-Aware Evaluation of Cognitive Bias in LLMs: From Benchmark to Applied Decision-Making
Katsumasa Yoshikawa | Junya Takayama | Takato Yamazaki
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Katsumasa Yoshikawa | Junya Takayama | Takato Yamazaki
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present a persona-aware evaluation suite that couples a 12-category cognitive-bias benchmark with 100 applied financial framing tasks to assess how large language models (LLMs) respond under systematically varied persona conditions. Using a factorized set of 162 personas spanning gender, age, political orientation, income, and education, we analyze how persona conditioning modulates bias-consistent responding across ten instruction-tuned models. On applied tasks, persona conditioning reduces framing reversals on average and slightly increases decision confidence, with substantial variation across model families and scales. Correlation analyses further reveal that benchmark bias tendencies—particularly availability, social proof, and framing—predict applied framing sensitivity, suggesting that standardized bias scores can serve as indicators of real-world decision variability. This work provides a unified framework for linking cognitive-bias evaluation with persona-conditioned decision behavior in LLMs. (All data and prompts will be released after acceptance to preserve anonymity.)
2025
Persona-Consistent Dialogue Generation via Pseudo Preference Tuning
Junya Takayama | Masaya Ohagi | Tomoya Mizumoto | Katsumasa Yoshikawa
Proceedings of the 31st International Conference on Computational Linguistics
Junya Takayama | Masaya Ohagi | Tomoya Mizumoto | Katsumasa Yoshikawa
Proceedings of the 31st International Conference on Computational Linguistics
We propose a simple yet effective method for enhancing persona consistency in dialogue response generation using Direct Preference Optimization (DPO). In our method, we generate responses from the response generation model using persona information that has been randomly swapped with data from other dialogues, treating these responses as pseudo-negative samples. The reference responses serve as positive samples, allowing us to create pseudo-preference data. Experimental results demonstrate that our model, fine-tuned with DPO on the pseudo preference data, produces more consistent and natural responses compared to models trained using supervised fine-tuning or reinforcement learning approaches based on entailment relations between personas and utterances.
2024
Dialogue Systems Can Generate Appropriate Responses without the Use of Question Marks?– a Study of the Effects of “?” for Spoken Dialogue Systems –
Tomoya Mizumoto | Takato Yamazaki | Katsumasa Yoshikawa | Masaya Ohagi | Toshiki Kawamoto | Toshinori Sato
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Tomoya Mizumoto | Takato Yamazaki | Katsumasa Yoshikawa | Masaya Ohagi | Toshiki Kawamoto | Toshinori Sato
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
When individuals engage in spoken discourse, various phenomena can be observed that differ from those that are apparent in text-based conversation. While written communication commonly uses a question mark to denote a query, in spoken discourse, queries are frequently indicated by a rising intonation at the end of a sentence. However, numerous speech recognition engines do not append a question mark to recognized queries, presenting a challenge when creating a spoken dialogue system. Specifically, the absence of a question mark at the end of a sentence can impede the generation of appropriate responses to queries in spoken dialogue systems. Hence, we investigate the impact of question marks on dialogue systems, with the results showing that they have a significant impact. Moreover, we analyze specific examples in an effort to determine which types of utterances have the impact on dialogue systems.
2023
An Open-Domain Avatar Chatbot by Exploiting a Large Language Model
Takato Yamazaki | Tomoya Mizumoto | Katsumasa Yoshikawa | Masaya Ohagi | Toshiki Kawamoto | Toshinori Sato
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Takato Yamazaki | Tomoya Mizumoto | Katsumasa Yoshikawa | Masaya Ohagi | Toshiki Kawamoto | Toshinori Sato
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
With the ambition to create avatars capable of human-level casual conversation, we developed an open-domain avatar chatbot, situated in a virtual reality environment, that employs a large language model (LLM). Introducing the LLM posed several challenges for multimodal integration, such as developing techniques to align diverse outputs and avatar control, as well as addressing the issue of slow generation speed. To address these challenges, we integrated various external modules into our system. Our system is based on the award-winning model from the Dialogue System Live Competition 5. Through this work, we hope to stimulate discussions within the research community about the potential and challenges of multimodal dialogue systems enhanced with LLMs.
2017
A Semi-universal Pipelined Approach to the CoNLL 2017 UD Shared Task
Hiroshi Kanayama | Masayasu Muraoka | Katsumasa Yoshikawa
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Hiroshi Kanayama | Masayasu Muraoka | Katsumasa Yoshikawa
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
This paper presents our system submitted for the CoNLL 2017 Shared Task, “Multilingual Parsing from Raw Text to Universal Dependencies.” We ran the system for all languages with our own fully pipelined components without relying on re-trained baseline systems. To train the dependency parser, we used only the universal part-of-speech tags and distance between words, and applied deterministic rules to assign dependency labels. The simple and delexicalized models are suitable for cross-lingual transfer approaches and a universal language model. Experimental results show that our model performed well in some metrics and leads discussion on topics such as contribution of each component and on syntactic similarities among languages.
2012
Sentence Compression with Semantic Role Constraints
Katsumasa Yoshikawa | Ryu Iida | Tsutomu Hirao | Manabu Okumura
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Katsumasa Yoshikawa | Ryu Iida | Tsutomu Hirao | Manabu Okumura
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Identifying Temporal Relations by Sentence and Document Optimizations
Katsumasa Yoshikawa | Masayuki Asahara | Ryu Iida
Proceedings of COLING 2012: Posters
Katsumasa Yoshikawa | Masayuki Asahara | Ryu Iida
Proceedings of COLING 2012: Posters
2011
Jointly Extracting Japanese Predicate-Argument Relation with Markov Logic
Katsumasa Yoshikawa | Masayuki Asahara | Yuji Matsumoto
Proceedings of 5th International Joint Conference on Natural Language Processing
Katsumasa Yoshikawa | Masayuki Asahara | Yuji Matsumoto
Proceedings of 5th International Joint Conference on Natural Language Processing