This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Wagner MeiraJr.
Also published as:
Wagner Meira Jr,
Wagner Meira Jr.
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Large Language Models (LLMs) are increasingly used in applications that shape public discourse, yet little is known aboutwhether they reflect distinct opinions on global issues like climate change. This study compares climate change-relatedresponses from multiple LLMs with human opinions collected through the People’s Climate Vote 2024 survey (UNDP – UnitedNations Development Programme and Oxford, 2024). We compare country and LLM”s answer probability distributions and apply Exploratory Factor Analysis (EFA) to identify latent opinion dimensions. Our findings reveal that while LLM responsesdo not exhibit significant biases toward specific demographic groups, they encompass a wide range of opinions, sometimesdiverging markedly from the majority human perspective.
Do LLMs process text and mathematics as a unified skill, or do these components rely on distinct underlying mechanisms? We investigate this question by disentangling the textual interpretation and mathematical solving steps in word problems drawn from Brazil’s largest college entrance exam (ENEM) and GSM8K, a popular grade school-level benchmark. Using the symbolic solver SymPy, we transform word problems into equivalent purely mathematical representations, isolating equation formulation from textual comprehension. Our extended benchmarks enable a structured analysis of LLM performance across these two dimensions. Through empirical evaluations, we find that small-scale LLMs struggle significantly more with text interpretation than with equation solving, with accuracy dropping by a factor of 2 to 7 when solving full word problems compared to their math-only counterparts. Exploratory factor analysis confirms a bidimensional structure in LLM reasoning, where models exhibit distinct proficiencies in textual and mathematical components, underscoring the need for targeted improvements in language comprehension. By analyzing the latent factors associated with each model, our findings provide a framework for researchers and practitioners to make informed choices when selecting models based on computational costs and the nature of their tasks.
Material facts (MF) are crucial and obligatory disclosures that can significantly influence asset values. Following their release, financial analysts embark on the meticulous and highly specialized task of crafting analyses to shed light on their impact on company assets, a challenge elevated by the daily amount of MFs released. Generative AI, with its demonstrated power of crafting coherent text, emerges as a promising solution to this task. However, while these analyses must incorporate the MF, they must also transcend it, enhancing it with vital background information, valuable and grounded recommendations, prospects, potential risks, and their underlying reasoning. In this paper, we approach this task as an instance of controllable text generation, aiming to ensure adherence to the MF and other pivotal attributes as control elements. We first explore language models’ capacity to manage this task by embedding those elements into prompts and engaging popular chatbots. A bilingual proof of concept underscores both the potential and the challenges of applying generative AI techniques to this task.