Tables are a primary medium for conveying critical information in administrative domains, yet their complexity hinders utilization by Large Language Models (LLMs). This paper introduces the Theme-Explanation Structure-based Table Summarization (Tabular-TX) pipeline, a novel approach designed to generate highly interpretable summaries from tabular data, with a specific focus on Korean administrative documents. Current table summarization methods often neglect the crucial aspect of human-friendly output. Tabular-TX addresses this by first employing a multi-step reasoning process to ensure deep table comprehension by LLMs, followed by a journalist persona prompting strategy for clear sentence generation. Crucially, it then structures the output into a Theme Part (an adverbial phrase) and an Explanation Part (a predicative clause), significantly enhancing readability. Our approach leverages in-context learning, obviating the need for extensive fine-tuning and associated labeled data or computational resources. Experimental results show that Tabular-TX effectively processes complex table structures and metadata, offering a robust and efficient solution for generating human-centric table summaries, especially in low-resource scenarios.
Non-factoid question answering (NFQA) poses a significant challenge due to its open-ended nature, diverse intents, and the necessity for multi-aspect reasoning, rendering conventional retrieval-augmented generation (RAG) approaches insufficient. To address this, we introduce Typed-RAG, a type-aware framework utilizing multi-aspect query decomposition tailored specifically for NFQA. Typed-RAG categorizes NFQs into distinct types—such as debate, experience, and comparison—and decomposes them into single-aspect sub-queries for targeted retrieval and generation. By synthesizing the retrieved results of these sub-queries, Typed-RAG generates more informative and contextually relevant responses. Additionally, we present Wiki-NFQA, a novel benchmark dataset encompassing diverse NFQ types. Experimental evaluation demonstrates that TypeRAG consistently outperforms baseline approaches, confirming the effectiveness of type-aware decomposition in improving both retrieval quality and answer generation for NFQA tasks.
Math word problem solving is an emerging research topic in Natural Language Processing. Recently, to address the math word problem-solving task, researchers have applied the encoder-decoder architecture, which is mainly used in machine translation tasks. The state-of-the-art neural models use hand-crafted features and are based on generation methods. In this paper, we propose the GEO (Generation of Equations by utilizing Operators) model that does not use hand-crafted features and addresses two issues that are present in existing neural models: 1. missing domain-specific knowledge features and 2. losing encoder-level knowledge. To address missing domain-specific feature issue, we designed two auxiliary tasks: operation group difference prediction and implicit pair prediction. To address losing encoder-level knowledge issue, we added an Operation Feature Feed Forward (OP3F) layer. Experimental results showed that the GEO model outperformed existing state-of-the-art models on two datasets, 85.1% in MAWPS, and 62.5% in DRAW-1K, and reached comparable performance of 82.1% in ALG514 dataset.
Solving algebraic word problems has recently emerged as an important natural language processing task. To solve algebraic word problems, recent studies suggested neural models that generate solution equations by using ‘Op (operator/operand)’ tokens as a unit of input/output. However, such a neural model suffered two issues: expression fragmentation and operand-context separation. To address each of these two issues, we propose a pure neural model, Expression-Pointer Transformer (EPT), which uses (1) ‘Expression’ token and (2) operand-context pointers when generating solution equations. The performance of the EPT model is tested on three datasets: ALG514, DRAW-1K, and MAWPS. Compared to the state-of-the-art (SoTA) models, the EPT model achieved a comparable performance accuracy in each of the three datasets; 81.3% on ALG514, 59.5% on DRAW-1K, and 84.5% on MAWPS. The contribution of this paper is two-fold; (1) We propose a pure neural model, EPT, which can address the expression fragmentation and the operand-context separation. (2) The fully automatic EPT model, which does not use hand-crafted features, yields comparable performance to existing models using hand-crafted features, and achieves better performance than existing pure neural models by at most 40%.