Ke Shu
2026
Comhis at SemEval-2026 Task 4: Embedding-Space Adaptation and LLM-Assisted Inference for Narrative Similarity
Ke Shu | Eetu Mäkelä | Mikko Tolonen
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Ke Shu | Eetu Mäkelä | Mikko Tolonen
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We present a two-stage system for the SemEval Narrative Similarity task that separates representation learning from comparative decision making. In Track B, we adapt a frozen large-scale embedding model using a lightweight projection layer trained with a triplet objective and hard example mining, producing a task-specific similarity space. In Track A, similarity scores derived from the adapted embedding space are incorporated into a large language model, which performs the final binary decision. On the official test set, our system achieves 0.68 accuracy on Track A and 0.66 on Track B, clearly outperforming the provided baselines and ranking 20th out of 44 teams on Track A and 10th out of 27 teams on Track B. These results demonstrate that efficient embedding adaptation combined with embedding-informed LLM reasoning is effective for modeling high-level narrative similarity.
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
Yu Wu | Ke Shu | Jonas Fischer | Lidia Pivovarova | David Rosson | Eetu Mäkelä | Mikko Tolonen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Yu Wu | Ke Shu | Jonas Fischer | Lidia Pivovarova | David Rosson | Eetu Mäkelä | Mikko Tolonen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper presents a novel task of extracting low-resourced and noisy Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary zero-shot models is achievable, yet these models lack a functional comprehension of Latin. This study establishes a comprehensive baseline for processing Latin within mixed-language corpora, supporting quantitative analysis in intellectual history and historical linguistics. Both the dataset and code are available at https://github.com/COMHIS/EACL26-detect-latin.