Hangqi Li


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
PhiloGPT: A Philology-Oriented Large Language Model for Ancient Chinese Manuscripts with Dunhuang as Case Study
Yuqing Zhang | Baoyi He | Yihan Chen | Hangqi Li | Han Yue | Shengyu Zhang | Huaiyong Dou | Junchi Yan | Zemin Liu | Yongquan Zhang | Fei Wu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Philology, the study of ancient manuscripts, demands years of professional training in ex-tensive knowledge memorization and manual textual retrieval. Despite these requirements align closely with strengths of recent successful Large Language Models (LLMs), the scarcity of high-quality, specialized training data has hindered direct applications. To bridge this gap, we curated the PhiloCorpus-ZH, a rich collec-tion of ancient Chinese texts spanning a millen-nium with 30 diverse topics, including firsthand folk copies. This corpus facilitated the develop-ment of PhiloGPT, the first LLM tailored for discovering ancient Chinese manuscripts. To effectively tackle complex philological tasks like restoration, attribution, and linguistic anal-ysis, we introduced the PhiloCoP framework. Modeled on the analytical patterns of philol-ogists, PhiloCoP enhances LLM’s handling of historical linguistic peculiarities such as phonetic loans, polysemy, and syntactic inver-sions. We further integrated these tasks into the PhiloBenchmark, establishing a new standard for evaluating ancient Chinese LLMs address-ing philology tasks. Deploying PhiloGPT in practical scenarios has enabled Dunhuang spe-cialists to resolve philology tasks, such as iden-tifying duplication of copied text and assisting archaeologists with text completion, demon-strating its potential in real-world applications.