Batoul Haydar


2025

pdf bib
On the Feasibility of LLM-based Automated Generation and Filtering of Competency Questions for Ontologies
Zola Mahlaza | C. Maria Keet | Nanee Chahinian | Batoul Haydar
Proceedings of the 5th Conference on Language, Data and Knowledge

54 Competency questions for ontologies are used in a number of ontology development tasks. The questions’ sentences structure have been analysed to inform ontology authoring and validation. One of the problems to make this a seamless process is the hurdle of writing good CQs manually or offering automated assistance in writing CQs. In this paper, we propose an enhanced and automated pipeline where one can trace meticulously through each step, using a mini-corpus, T5, and the SQuAD dataset to generate questions, and the CLaRO controlled language, semantic similarity, and other steps for filtering. This was evaluated with two corpora of different genre in the same broad domain and evaluated with domain experts. The final output questions across the experiments were around 25% for scope and relevance and 45% of unproblematic quality. Technically, it provided ample insight into trade-offs in generation and filtering, where relaxing filtering increased sentence structure diversity but also led to more spurious sentences that required additional processing