Ryo Sekizawa


2025

pdf bib
JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models
Hitomi Yanaka | Namgi Han | Ryoma Kumon | Lu Jie | Masashi Takeshita | Ryo Sekizawa | Taisei Katô | Hiromi Arai
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

With the development of large language models (LLMs), social biases in these LLMs have become a pressing issue.Although there are various benchmarks for social biases across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated.In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ, with analysis of social biases in Japanese LLMs.The results show that while current open Japanese LLMs with more parameters show improved accuracies on JBBQ, their bias scores increase.In addition, prompts with a warning about social biases and chain-of-thought prompting reduce the effect of biases in model outputs, but there is room for improvement in extracting the correct evidence from contexts in Japanese. Our dataset is available at https://github.com/ynklab/JBBQ_data.

2023

pdf bib
Constructing Multilingual Code Search Dataset Using Neural Machine Translation
Ryo Sekizawa | Nan Duan | Shuai Lu | Hitomi Yanaka
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only in English. In this research, we create a multilingual code search dataset in four natural and four programming languages using a neural machine translation model. Using our dataset, we pre-train and fine-tune the Transformer-based models and then evaluate them on multiple code search test sets. Our results show that the model pre-trained with all natural and programming language data has performed best in most cases. By applying back-translation data filtering to our dataset, we demonstrate that the translation quality affects the model’s performance to a certain extent, but the data size matters more.

pdf bib
Analyzing Syntactic Generalization Capacity of Pre-trained Language Models on Japanese Honorific Conversion
Ryo Sekizawa | Hitomi Yanaka
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Using Japanese honorifics is challenging because it requires not only knowledge of the grammatical rules but also contextual information, such as social relationships. It remains unclear whether pre-trained large language models (LLMs) can flexibly handle Japanese honorifics like humans. To analyze this, we introduce an honorific conversion task that considers social relationships among people mentioned in a conversation. We construct a Japanese honorifics dataset from problem templates of various sentence structures to investigate the syntactic generalization capacity of GPT-3, one of the leading LLMs, on this task under two settings: fine-tuning and prompt learning. Our results showed that the fine-tuned GPT-3 performed better in a context-aware honorific conversion task than the prompt-based one. The fine-tuned model demonstrated overall syntactic generalizability towards compound honorific sentences, except when tested with the data involving direct speech.