Yijun Liang


2024

pdf
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence
Zongxia Li | Ishani Mondal | Huy Nghiem | Yijun Liang | Jordan Lee Boyd-Graber
Findings of the Association for Computational Linguistics: EMNLP 2024

Question answering (QA) can only make progress if we know if an answer is correct, but current answer correctness (AC) metrics struggle with verbose, free-form answers from large language models (LLMs). There are two challenges with current short-form QA evaluations: a lack of diverse styles of evaluation data and an over-reliance on expensive and slow LLMs. LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing rubrics and datasets for evaluating machine QA adopted from the Trivia community. We also propose an efficient, and interpretable QA evaluation that is more stable than an exact match and neural methods (BERTScore).

2023

pdf
GeoDRL: A Self-Learning Framework for Geometry Problem Solving using Reinforcement Learning in Deductive Reasoning
Shuai Peng | Di Fu | Yijun Liang | Liangcai Gao | Zhi Tang
Findings of the Association for Computational Linguistics: ACL 2023

Ensuring both interpretability and correctness is a great challenge in automated geometry problem solving (GPS), and the scarcity of labeled data hinders learning mathematical reasoning from samples. Therefore, we present GeoDRL, a self-learning geometry problem solving framework that integrates logic graph deduction and Deep Reinforcement Learning (DRL) to optimize geometry reasoning as a Markov Decision Process. GeoDRL employs a Graph Neural Network on a Geometry Logic Graph, updating the problem state using a symbolic system. Incorporating DRL into deductive reasoning enables GeoDRL to achieve unsupervised self-learning while maintaining correctness. GeoDRL, through unsupervised learning, exhibits enhanced accuracy in the Geometry3K dataset, improving by 11.1% over previous SOTA methods, and simultaneously boosts efficiency and interpretability.