Tom Mitchell

Also published as: Tom M. Mitchell

2024

Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)
Estevam Hruschka | Thom Lake | Naoki Otani | Tom Mitchell
Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)

2023

pdf bib abs

The Internal State of an LLM Knows When It’s Lying
Amos Azaria | Tom Mitchell
Findings of the Association for Computational Linguistics: EMNLP 2023

While Large Language Models (LLMs) have shown exceptional performance in various tasks, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. In this paper, we provide evidence that the LLM’s internal state can be used to reveal the truthfulness of statements. This includes both statements provided to the LLM, and statements that the LLM itself generates. Our approach is to train a classifier that outputs the probability that a statement is truthful, based on the hidden layer activations of the LLM as it reads or generates the statement. Experiments demonstrate that given a set of test sentences, of which half are true and half false, our trained classifier achieves an average of 71% to 83% accuracy labeling which sentences are true versus false, depending on the LLM base model. Furthermore, we explore the relationship between our classifier’s performance and approaches based on the probability assigned to the sentence by the LLM. We show that while LLM-assigned sentence probability is related to sentence truthfulness, this probability is also dependent on sentence length and the frequencies of words in the sentence, resulting in our trained classifier providing a more reliable approach to detecting truthfulness, highlighting its potential to enhance the reliability of LLM-generated content and its practical applicability in real-world scenarios.

pdf bib

Zero-shot Triplet Extraction by Template Infilling
Bosung Kim | Hayate Iso | Nikita Bhutani | Estevam Hruschka | Ndapa Nakashole | Tom Mitchell
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib

Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023)
Estevam Hruschka | Tom Mitchell | Sajjadur Rahman | Dunja Mladenić | Marko Grobelnik
Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023)

2022

pdf bib abs

Towards General Natural Language Understanding with Probabilistic Worldbuilding
Abulhair Saparov | Tom M. Mitchell
Transactions of the Association for Computational Linguistics, Volume 10

We introduce the Probabilistic Worldbuilding Model (PWM), a new fully symbolic Bayesian model of semantic parsing and reasoning, as a first step in a research program toward more domain- and task-general NLU and AI. Humans create internal mental models of their observations that greatly aid in their ability to understand and reason about a large variety of problems. In PWM, the meanings of sentences, acquired facts about the world, and intermediate steps in reasoning are all expressed in a human-readable formal language, with the design goal of interpretability. PWM is Bayesian, designed specifically to be able to generalize to new domains and new tasks. We derive and implement an inference algorithm that reads sentences by parsing and abducing updates to its latent world model that capture the semantics of those sentences, and evaluate it on two out-of-domain question-answering datasets: (1) ProofWriter and (2) a new dataset we call FictionalGeoQA, designed to be more representative of real language but still simple enough to focus on evaluating reasoning ability, while being robust against heuristics. Our method outperforms baselines on both, thereby demonstrating its value as a proof-of-concept.

pdf bib

Proceedings of the 2nd Workshop on Deriving Insights from User-Generated Text
Estevam Hruschka | Tom Mitchell | Dunja Mladenic | Marko Grobelnik | Nikita Bhutani
Proceedings of the 2nd Workshop on Deriving Insights from User-Generated Text

2021

pdf bib abs

Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and Symbolic Logic Rules
Forough Arabshahi | Jennifer Lee | Antoine Bosselut | Yejin Choi | Tom Mitchell
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

One of the challenges faced by conversational agents is their inability to identify unstated presumptions of their users’ commands, a task trivial for humans due to their common sense. In this paper, we propose a zero-shot commonsense reasoning system for conversational agents in an attempt to achieve this. Our reasoner uncovers unstated presumptions from user commands satisfying a general template of if-(state), then-(action), because-(goal). Our reasoner uses a state-of-the-art transformer-based generative commonsense knowledge base (KB) as its source of background knowledge for reasoning. We propose a novel and iterative knowledge query mechanism to extract multi-hop reasoning chains from the neural KB which uses symbolic logic rules to significantly reduce the search space. Similar to any KBs gathered to date, our commonsense KB is prone to missing knowledge. Therefore, we propose to conversationally elicit the missing knowledge from human users with our novel dynamic question generation strategy, which generates and presents contextualized queries to human users. We evaluate the model with a user study with human users that achieves a 35% higher success rate compared to SOTA.

2020

pdf bib abs

Interactive Task Learning from GUI-Grounded Natural Language Instructions and Demonstrations
Toby Jia-Jun Li | Tom Mitchell | Brad A. Myers
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We show SUGILITE, an intelligent task automation agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations, using the graphical user interfaces (GUIs) of third-party mobile apps. This system provides several interesting features: (1) it allows users to teach new task procedures and concepts through verbal instructions together with demonstration of the steps of a script using GUIs; (2) it supports users in clarifying their intents for demonstrated actions using GUI-grounded verbal instructions; (3) it infers parameters of tasks and their possible values in utterances using the hierarchical structures of the underlying app GUIs; and (4) it generalizes taught concepts to different contexts and task domains. We describe the architecture of the SUGILITE system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android.