Ken Yano

2026

Assessing the Belief Consistency of Large Language Models on the Logical Conversation Process
Tomoki Tsujimura | Mat\={i}ss Rikters | Masaki Asada | Shusaku Egami | Tatsuya Ishigaki | Ken Yano | Hiroya Takamura
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To reliably interpret the evolving context of an LLM as a reasoning trace, the underlying belief of the LLM needs to transition consistently with the progression of the context.We focus on evaluating whether the beliefs held by a model remain consistent before and after the extension of the context.Previous research on consistency evaluation typically uses datasets with ground-truth answers, which is problematic because task-solving ability acts as a confounding factor, obscuring the direct evaluation of consistency.Furthermore, evaluating cases where inconsistency stems from multiple errors poses difficulties.We propose a new evaluation method to assess the consistency of LLMs in a multiple-choice question answering format, designed so that any option chosen is correct, allowing for the evaluation of the proposed belief consistency.It also supports isolation of errors such as reasoning failures and biases.We reveal that the belief consistency does not improve solely with model size scaling,whereas continual pre-training on code and mathematics text improves it.Furthermore, models trained on code and mathematics text show a seemingly contradictory result of increased logical failures, indicating that belief consistency and superficial consistency are not necessarily directly linked.

2025

pdf bib abs

Effect of Multilingual and Domain-adapted Continual Pre-training on Few-shot Promptability
Ken Yano | Makoto Miwa
Proceedings of the 24th Workshop on Biomedical Language Processing

Continual Pre-training (CPT) can help pre-trained large language models (LLMs) effectively adapt to new or under-trained domains or low-resource languages without re-training from scratch.Nevertheless, during CPT, the model’s few-shot transfer ability is known to be affected for emergent tasks.We verified this by comparing the performance between the few-shot and fine-tuning settings on the same tasks.We used Llama3-ELAINE-medLLM, which was continually pre-trained on Llama3-8B, targeted for the biomedical domain, and adapted for multilingual languages (English, Japanese, and Chinese).We compared the performance of Llama3-ELAINE-medLLM and Llama3-8B in three emergent tasks: two related domain tasks, entity recognition (NER) and machine translation (MT), and one out-of-domain task, summarization (SUM). Our experimental results show that degradation in few-shot transfer ability does not necessarily indicate the model’s underlying potential on the same task after fine-tuning.

pdf bib abs

We propose ELAINE (EngLish-jApanese-chINesE)-medLLM, a trilingual (English, Japanese, Chinese) large language model adapted for the bio-medical domain based on Llama-3-8B. The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model. The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT). Our results demonstrate that ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model’s capability.

2023

pdf bib abs

DISTANT: Distantly Supervised Entity Span Detection and Classification
Ken Yano | Makoto Miwa | Sophia Ananiadou
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We propose a distantly supervised pipeline NER which executes entity span detection and entity classification in sequence named DISTANT (DIstantly Supervised enTity spAN deTection and classification).The former entity span detector extracts possible entity mention spans by the distant supervision. Then the later entity classifier assigns each entity span to one of the positive entity types or none by employing a positive and unlabeled (PU) learning framework. Two models were built based on the pre-trained SciBERT model and fine-tuned with the silver corpus generated by the distant supervision. Experimental results on BC5CDR and NCBI-Disease datasets show that our method outperforms the end-to-end NER baselines without PU learning by a large margin. In particular, it increases the recall score effectively.