AI Corpus Linguist: More than a Year of Experience

Jiří Milička, Tomáš Machálek


Abstract
We present an AI assistant designed to help researchers interact with language corpora using natural language instead of formal query languages. Built as a custom GPT with access to multilingual corpora via Czech National Corpus platform API, the system translates research questions into CQL queries, retrieves corpus data, and guides users through linguistic analysis. After more than a year of deployment, the system has processed over 1000 interactions with human users. We discuss the hybrid approach combining rule-based translation with LLM intelligence, challenges of building on a constantly evolving platform, and lessons learned from production usage. Notably, this system represents the first voice-enabled corpus interface in history, significantly lowering barriers to corpus-based research for non-technical users and users outside linguistic fields.
Anthology ID:
2026.latechclfl-1.29
Volume:
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
Venues:
LaTeCH-CLfL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
305–310
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.latechclfl-1.29/
DOI:
Bibkey:
Cite (ACL):
Jiří Milička and Tomáš Machálek. 2026. AI Corpus Linguist: More than a Year of Experience. In Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026, pages 305–310, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
AI Corpus Linguist: More than a Year of Experience (Milička & Machálek, LaTeCH-CLfL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.latechclfl-1.29.pdf
Supplementarymaterial:
 2026.latechclfl-1.29.SupplementaryMaterial.txt
Supplementarymaterial:
 2026.latechclfl-1.29.SupplementaryMaterial.zip
Supplementarymaterial:
 2026.latechclfl-1.29.SupplementaryMaterial.zip