Carl Strathearn
2026
Frame2KG: A Benchmark and Evaluation Toolkit for Interpretable Frame-to-Graph Generation
Lewis N. Watson | Carl Strathearn | Kenny Mitchell | Yanchao Yu
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Lewis N. Watson | Carl Strathearn | Kenny Mitchell | Yanchao Yu
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Interpretable frame-to-knowledge-graph (Frame2KG) generation enables structured visual scene representation while supporting on-device inference to enhance privacy, improve interpretability, and minimise compute. We introduce Frame2KG-YC2, a synthetic, reproducible dataset derived from YouCook2 that pairs keyframes with schema-valid JSON knowledge graphs containing typed, spatially grounded entities and semantic predicates, alongside faithful textual paraphrases. Using this corpus, we fine-tune Qwen2.5-VL models (3B and 7B) with parameter-efficient LoRA adapters on attention layers (QKVO), with and without GateProj/Up/Down MLP projections. For evaluation and benchmarking, we propose a deterministic toolkit featuring two-stage node matching, an IoU gate followed by Hungarian assignment on blended spatial-semantic similarity, and comprehensive metrics spanning node/edge precision-recall-F1, matched-pair IoU, and structural validity. On a held-out test set, our models achieve Node F1μ up to 0.621 and Edge F1μ up to 0.208, with mean matched IoU of ≈0.61 and >98% schema conformity. We show that MLP gating consistently improves predicate accuracy and spatial grounding, while post-training quantisation maintains accuracy and improves deployability on edge hardware. We release the dataset, code, adapters, and evaluation toolkit to establish an open, interpretable baseline for future temporal and multi-view extensions.
PAIR: A Pilot Dataset for Dual Perspective-based Video-Grounded Dialogue and Reconciliation
Lewis N. Watson | Carl Strathearn | Kenny Mitchell | Yanchao Yu
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Lewis N. Watson | Carl Strathearn | Kenny Mitchell | Yanchao Yu
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Collaborative dialogue in multi-agent settings often requires interlocutors to integrate partially overlapping perceptual information in order to construct a shared representation of a dynamic environment. We introduce PAIR, a pilot conversational corpus designed to examine how humans coordinate under systematic perceptual asymmetry. The dataset comprises 15 dialogues in which participants observed the same activity from complementary egocentric and exocentric video perspectives and engaged in open-ended discussion to produce a joint account. All transcripts were manually verified and annotated with 42 dialogue act categories, enabling fine-grained analysis of interactional structure. Beyond descriptive statistics, PAIR supports examination of measurable conversational configurations, including turn distribution, participation symmetry, and dialogue act composition, which together provide structural indicators of how perspective integration unfolds in dialogue. Although intentionally lightweight, PAIR is positioned as a controlled benchmark for analysing collaborative dialogue mechanisms rather than a large-scale training resource. The corpus supports dialogue act classification, video-grounded dialogue modelling, and investigation of multi-agent reasoning under distributed perceptual access. By coupling dual-perspective grounding with explicit interactional annotation, PAIR offers a compact testbed for studying reconciliation dynamics in task-oriented dialogue.
2025
Embodied Conversational Systems in Human–Robot Interaction: Introduction to the Special Issue
Dimitra Gkatzia | Hendrik Buschmeier | Mary Ellen Foster | Carl Strathearn
Dialogue & Discourse Volume 16
Dimitra Gkatzia | Hendrik Buschmeier | Mary Ellen Foster | Carl Strathearn
Dialogue & Discourse Volume 16
This editorial introduces the special issue on Embodied Conversational Systems in Human–Robot Interaction.
2022
Task2Dial: A Novel Task and Dataset for Commonsense-enhanced Task-based Dialogue Grounded in Documents
Carl Strathearn | Dimitra Gkatzia
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Carl Strathearn | Dimitra Gkatzia
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
This paper proposes a novel task on commonsense-enhanced task-based dialogue grounded in documents and describes the Task2Dial dataset, a novel dataset of document-grounded task-based dialogues, where an Information Giver (IG) provides instructions (by consulting a document) to an Information Follower (IF), so that the latter can successfully complete the task. In this unique setting, the IF can ask clarification questions which may not be grounded in the underlying document and require commonsense knowledge to be answered. The Task2Dial dataset poses new challenges: (1) its human reference texts show more lexical richness and variation than other document-grounded dialogue datasets; (2) generating from this set requires paraphrasing as instructional responses might have been modified from the underlying document; (3) requires commonsense knowledge, since questions might not necessarily be grounded in the document; (4) generating requires planning based on context, as task steps need to be provided in order. The Task2Dial dataset contains dialogues with an average 18.15 number of turns and 19.79 tokens per turn, as compared to 12.94 and 12 respectively in existing datasets. As such, learning from this dataset promises more natural, varied and less template-like system utterances.
2021
Chefbot: A Novel Framework for the Generation of Commonsense-enhanced Responses for Task-based Dialogue Systems
Carl Strathearn | Dimitra Gkatzia
Proceedings of the 14th International Conference on Natural Language Generation
Carl Strathearn | Dimitra Gkatzia
Proceedings of the 14th International Conference on Natural Language Generation
Conversational systems aim to generate responses that are accurate, relevant and engaging, either through utilising neural end-to-end models or through slot filling. Human-to-human conversations are enhanced by not only the latest utterance of the interlocutor, but also by recalling relevant information about concepts/objects covered in the dialogue and integrating them into their responses. Such information may contain recent referred concepts, commonsense knowledge and more. A concrete scenario of such dialogues is the cooking scenario, i.e. when an artificial agent (personal assistant, robot, chatbot) and a human converse about a recipe. We will demo a novel system for commonsense enhanced response generation in the scenario of cooking, where the conversational system is able to not only provide directions for cooking step-by-step, but also display commonsense capabilities by offering explanations of how objects can be used and provide recommendations for replacing ingredients.