This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
YuFeng
Papers on this page may belong to the following people:Yu Feng,
Yu Feng
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Quantifying uncertainty in black-box LLMs is vital for reliable responses and scalable oversight. Existing methods, which gauge a model’s uncertainty through evaluating self-consistency in responses to the target query, can be misleading: an LLM may confidently provide an incorrect answer to a target query, yet give a confident and accurate answer to that same target query when answering a knowledge-preserving perturbation of the query. We systematically analyze the model behaviors and demonstrate that this discrepancy stems from suboptimal retrieval of parametric knowledge, often due to contextual biases that prevent consistent access to stored knowledge. We then introduce DiverseAgentEntropy, a novel, theoretically-grounded method employing multi-agent interaction across diverse query variations for uncertainty estimation of black-box LLMs. This approach more accurately assesses an LLM’s true uncertainty and improves hallucination detection, outperforming existing self-consistency based techniques.
This paper explores computational approaches for detecting parallelism in classical Chinese poetry, a rhetorical device where two verses mirror each other in syntax, meaning, tone, and rhythm. We experiment with five classification methods: (1) verb position matching, (2) integrated semantic, syntactic, and word-segmentation analysis, (3) difference-based character embeddings, (4) structured examples (inner/outer couplets), and (5) GPT-guided classification. We use a manually annotated dataset, containing 6,125 pentasyllabic couplets, to evaluate performance. The results indicate that parallelism detection poses a significant challenge even for powerful LLMs such as GPT-4o, with the highest F1 score below 0.72. Nevertheless, each method contributes valuable insights into the art of parallelism in Chinese poetry, suggesting a new understanding of parallelism as a verbal expression of principal components in a culturally defined vector space.
Knowledge Base Question Answering (KBQA) is to answer natural language questions posed over knowledge bases (KBs). This paper targets at empowering the IR-based KBQA models with the ability of numerical reasoning for answering ordinal constrained questions. A major challenge is the lack of explicit annotations about numerical properties. To address this challenge, we propose a pretraining numerical reasoning model consisting of NumGNN and NumTransformer, guided by explicit self-supervision signals. The two modules are pretrained to encode the magnitude and ordinal properties of numbers respectively and can serve as model-agnostic plugins for any IR-based KBQA model to enhance its numerical reasoning ability. Extensive experiments on two KBQA benchmarks verify the effectiveness of our method to enhance the numerical reasoning ability for IR-based KBQA models.