Shengzhe Li
2026
JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version
Shengzhe Li | Masaya Ohagi | Ryokan Ri | Akihiko Fukuchi | Tomohide Shibata | Daisuke Kawahara
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Shengzhe Li | Masaya Ohagi | Ryokan Ri | Akihiko Fukuchi | Tomohide Shibata | Daisuke Kawahara
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present JMTEB, a large-scale evaluation suite for Japanese text embedding models, designed to provide comprehensive coverage across multiple task types. The benchmark integrates 28 datasets across 5 tasks, enabling broad and challenging evaluation of model performance in diverse scenarios. While the full benchmark delivers thorough assessment, its scale poses practical challenges in terms of computation time and resource requirements. To address this, we construct JMTEB-lite, a lightweight version of JMTEB, by substantially reducing corpus size in retrieval-related tasks. JMTEB-lite significantly accelerates evaluation while maintaining high fidelity to the full benchmark. Together, JMTEB and JMTEB-lite form a flexible evaluation framework: the full version serves as a comprehensive standard for exhaustive benchmarking, while the lightweight version enables rapid iteration and efficient model selection. This dual approach facilitates both rigorous evaluation and practical development workflows, supporting the advancement of Japanese text embedding research.
Construction of a Japanese RAG Benchmark Using Synthetic Documents on Non-existent Entities and Events
Shengzhe Li | Masaya Ohagi | Hayato Tsukagoshi | Akihiko Fukuchi | Tomohide Shibata | Daisuke Kawahara
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Shengzhe Li | Masaya Ohagi | Hayato Tsukagoshi | Akihiko Fukuchi | Tomohide Shibata | Daisuke Kawahara
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Retrieval-augmented generation (RAG) is a technique in which a large language model (LLM) generates answers based on relevant documents retrieved from an external document collection. Existing RAG evaluation benchmarks often use public data, such as Wikipedia and news articles, as the external document collection. However, these data are highly likely to be already included in the LLM’s pre-training corpus, which may prevent an accurate evaluation of the model’s ability to generate answers based on the retrieved documents. In this study, we construct a Japanese RAG benchmark by having an LLM synthesize documents about non-existent entities and events and use this collection of synthetic documents as the search target. Since these synthetic documents are not included in the LLM’s training data, the ability to generate answers based on retrieved documents can be evaluated more accurately. In addition to the synthetic documents, the benchmark is composed of questions and correct answers, which are created using a combination of LLMs and human effort. We then evaluated and analyzed the RAG performance of existing LLMs using the constructed benchmark.
2023
Bridging the Gap between Subword and Character Segmentation in Pretrained Language Models
Shun Kiyono | Sho Takase | Shengzhe Li | Toshinori Sato
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Shun Kiyono | Sho Takase | Shengzhe Li | Toshinori Sato
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Pretrained language models require the use of consistent segmentation (e.g., subword- or character-level segmentation) in pretraining and finetuning. In NLP, many tasks are modeled by subword-level segmentation better than by character-level segmentation. However, because of their format, several tasks require the use of character-level segmentation. Thus, in order to tackle both types of NLP tasks, language models must be independently pretrained for both subword and character-level segmentation. However, this is an inefficient and costly procedure. Instead, this paper proposes a method for training a language model with unified segmentation. This means that the trained model can be finetuned on both subword- and character-level segmentation. The principle of the method is to apply the subword regularization technique to generate a mixture of subword- and character-level segmentation. Through experiment on BERT models, we demonstrate that our method can halve the computational cost of pretraining.
2022
Building a Personalized Dialogue System with Prompt-Tuning
Tomohito Kasahara | Daisuke Kawahara | Nguyen Tung | Shengzhe Li | Kenta Shinzato | Toshinori Sato
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Tomohito Kasahara | Daisuke Kawahara | Nguyen Tung | Shengzhe Li | Kenta Shinzato | Toshinori Sato
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Dialogue systems without consistent responses are not attractive. In this study, we build a dialogue system that can respond based on a given character setting (persona) to bring consistency. Considering the trend of the rapidly increasing scale of language models, we propose an approach that uses prompt-tuning, which has low learning costs, on pre-trained large-scale language models. The results of the automatic and manual evaluations in English and Japanese show that it is possible to build a dialogue system with more natural and personalized responses with less computational resources than fine-tuning.