Muxuan Liu


2026

We propose JaSocial, a novel evaluation framework that leverages Japanese emails to comprehensively evaluate large language models’ (LLMs) social intelligence across varied social-status relationships. The framework integrates three core components. First, we construct and publicly release a meticulously human-annotated Japanese email dataset covering six distinct social-status contexts, thereby capturing nuanced shifts in social hierarchy and politeness. Second, we adopt Systemic Functional Linguistics (SFL)—a social-semiotic linguistic theory that explicitly models how linguistic choices realize interpersonal relations and hierarchical distinctions—to classify email content in terms of three perspectives: social relationships, speech functions, and honorific expressions. Based on these perspectives, we design an automated evaluation method that assigns each LLM-generated email a contextual appropriateness score, quantifying how well it reflects socially intelligent behavior. Third, we release the full evaluation code to ensure reproducibility and enable fair cross-model comparisons. JaSocial exposes current LLMs’ limitations in capturing cultural nuance, while providing an open benchmark for future research.

2024

2023

2022

In Japanese, there are different expressions used in speech depending on the speaker’s and listener’s social status, called honorifics. Unlike other languages, Japanese has many types of honorific expressions, and it is vital for machine translation and dialogue systems to handle the differences in meaning correctly. However, there is still no corpus that deals with honorific expressions based on social status. In this study, we developed an honorific corpus (KeiCO corpus) that includes social status information based on Systemic Functional Linguistics, which expresses language use in situations from the social group’s values and common understanding. As a general-purpose language resource, it filled in the Japanese honorific blanks. We expect the KeiCO corpus could be helpful for various tasks, such as improving the accuracy of machine translation, automatic evaluation, correction of Japanese composition and style transformation. We also verified the accuracy of our corpus by a BERT-based classification task.