Ancient Chinese books have great values in history and cultural studies. Named en-tities like person, location, time are cru-cial elements, thus automatic Named En-tity Recognition (NER) is considered a ba-sic task in ancient Chinese text processing. This paper introduces EvaHan2025, the first international ancient Chinese Named Entity Recognition bake-off. The evalua-tion introduces a rigorous benchmark for assessing NER performance across histori-cal and medical texts, covering 12 named entity types. A total of 13 teams par-ticipated in the competition, submitting 77 system runs. In the closed modality, where participants were restricted to us-ing only the training data, the highest F1 scores reached 85.04% on TestA and 90.28% on TestB, both derived from his-torical texts, while performance on medi-cal texts (TestC) reached 84.49%. The re-sults indicate that text genre significantly impacts model performance, with histori-cal texts generally yielding higher scores. Additionally, the intrinsic characteristics of named entities also influence recogni-tion performance. These findings high-light the challenges and opportunities in ancient Chinese NER and underscore the importance of domain adaptation and en-tity type diversity in future research.
A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP). This holds particular significance in the context of evaluating and advancing Large Language Models (LLMs), given the existing gap in specialized benchmarks. To bridge this gap, we introduce SportQA, a novel benchmark specifically designed for evaluating LLMs in the context of sports understanding. SportQA encompasses over 70,000 multiple-choice questions across three distinct difficulty levels, each targeting different aspects of sports knowledge from basic historical facts to intricate, scenario-based reasoning tasks. We conducted a thorough evaluation of prevalent LLMs, mainly utilizing few-shot learning paradigms supplemented by chain-of-thought (CoT) prompting. Our results reveal that while LLMs exhibit competent performance in basic sports knowledge, they struggle with more complex, scenario-based sports reasoning, lagging behind human expertise. The introduction of SportQA marks a significant step forward in NLP, offering a tool for assessing and enhancing sports understanding in LLMs. The dataset is available at https://github.com/haotianxia/SportQA
Topic models have been prevailing for many years on discovering latent semantics while modeling long documents. However, for short texts they generally suffer from data sparsity because of extremely limited word co-occurrences; thus tend to yield repetitive or trivial topics with low quality. In this paper, to address this issue, we propose a novel neural topic model in the framework of autoencoding with a new topic distribution quantization approach generating peakier distributions that are more appropriate for modeling short texts. Besides the encoding, to tackle this issue in terms of decoding, we further propose a novel negative sampling decoder learning from negative samples to avoid yielding repetitive topics. We observe that our model can highly improve short text topic modeling performance. Through extensive experiments on real-world datasets, we demonstrate our model can outperform both strong traditional and neural baselines under extreme data sparsity scenes, producing high-quality topics.