Kexin Bian

2026

This study establishes an evaluation framework for document-level text simplification in Japanese by constructing a human-annotated dataset and examining the reliability of LLM-based automatic evaluation. We first developed detailed annotation guidelines covering four criteria—necessity, sufficiency, sentence-level simplicity, and document-level simplicity—and collected human ratings for 1,128 source–target document pairs derived from the Wikipedia part of the Japanese simplification corpus JADOS. Using this dataset, we conducted extensive experiments comparing human judgments with evaluations from large language models, including GPT, Claude, and Gemini. The results show that GPT-4o and Gemini 2.5 Pro achieve high agreement with human annotators even in the 0-shot setting, demonstrating their potential as reliable automatic evaluators for Japanese simplification. However, LLMs exhibited a consistent tendency to underestimate document-level simplicity, particularly for kanji-dense texts or texts with relatively long sentences and a small number of sentences. This work provides the first benchmark for evaluating document-level text simplification in Japanese and offers practical evidence that LLM-based evaluation can support scalable assessment for Japanese document-level simplification.

2025

pdf bib abs

HIT-YOU at TSAR 2025 Shared Task Leveraging Similarity-Based Few-Shot Prompting, Round-Trip Translation, and Self-Refinement for Readability-Controlled Text Simplification
Mao Shimada | Kexin Bian | Zhidong Ling | Mamoru Komachi
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

We describe our submission to the TSAR 2025 shared task on readability-controlled text simplification, which evaluates systems on their ability to adjust linguistic complexity to specified CEFR levels while preserving meaning and coherence. We explored two complementary frameworks leveraging the shared task CEFR classifier as feedback. The first is an ensemble approach generating diverse candidates using multiple LLMs under zero-shot prompting with level-specific instructions and vocabulary lists, one-shot prompting, and round-trip translation. Candidates were filtered by predicted CEFR level before an LLM judge selected the final output. The second framework is a self-refinement loop, where a single candidate is iteratively revised with classifier feedback until matching the target level or reaching a maximum number of iterations. This study is among the first to apply round-trip translation and iterative self-refinement to controlled simplification, broadening the toolkit for adapting linguistic complexity.

Co-authors

Hikari Tanaka 1

Iori Yamashita 1

Venues

Fix author