Alice Heiman


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions
Alice Heiman
Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025)

A common language with shared standard definitions is essential for effective climate conversations. However, there is concern that LLMs may misrepresent and/or diversify climate-related terms. We compare 305 official IPCC glossary definitions with those generated by OpenAI’s GPT-4o-mini and investigate their adherence, robustness, and readability using a combination of SBERT sentence embeddings and statistical measures. The LLM definitions received average adherence and robustness scores of 0.58 ± 0.15 and 0.96 ± 0.02, respectively. Both sustainability-related terminologies remain challenging to read, with model-generated definitions varying mainly among words with multiple or ambiguous definitions. Thus, the results highlight the potential of LLMs to support environmental discourse while emphasizing the need to align model outputs with established terminology for clarity and consistency.

2022

pdf bib
Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
Ariel Ekgren | Amaru Cuba Gyllensten | Evangelia Gogoulou | Alice Heiman | Severine Verlinden | Joey Öhman | Fredrik Carlsson | Magnus Sahlgren
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present GTP-SW3, a 3.5 billion parameter autoregressive language model, trained on a newly created 100 GB Swedish corpus. This paper provides insights with regards to data collection and training, while highlights the challenges of proper model evaluation. The results of quantitive evaluation through perplexity indicate that GPT-SW3 is a competent model in comparison with existing autoregressive models of similar size. Additionally, we perform an extensive prompting study which reveals the good text generation capabilities of GTP-SW3.