@inproceedings{das-etal-2023-combating,
    title = "Combating Hallucination and Misinformation: Factual Information Generation with Tokenized Generative Transformer",
    author = "Das, Sourav  and
      Chatterji, Sanjay  and
      Mukherjee, Imon",
    editor = {H{\"a}m{\"a}l{\"a}inen, Mika  and
      {\"O}hman, Emily  and
      Pirinen, Flammie  and
      Alnajjar, Khalid  and
      Miyagawa, So  and
      Bizzoni, Yuri  and
      Partanen, Niko  and
      Rueter, Jack},
    booktitle = "Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages",
    month = dec,
    year = "2023",
    address = "Tokyo, Japan",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.nlp4dh-1.18/",
    pages = "143--152",
    abstract = "Large language models have gained a meteoric rise recently. With the prominence of LLMs, hallucination and misinformation generation have become a severity too. To combat this issue, we propose a contextual topic modeling approach called Co-LDA for generative transformer. It is based on Latent Dirichlet Allocation and is designed for accurate sentence-level information generation. This method extracts cohesive topics from COVID-19 research literature, grouping them into relevant categories. These contextually rich topic words serve as masked tokens in our proposed Tokenized Generative Transformer, a modified Generative Pre-Trained Transformer for generating accurate information in any designated topics. Our approach addresses micro hallucination and incorrect information issues in experimentation with the LLMs. We also introduce a Perplexity-Similarity Score system to measure semantic similarity between generated and original documents, offering accuracy and authenticity for generated texts. Evaluation of benchmark datasets, including question answering, language understanding, and language similarity demonstrates the effectiveness of our text generation method, surpassing some state-of-the-art transformer models."
}Markdown (Informal)
[Combating Hallucination and Misinformation: Factual Information Generation with Tokenized Generative Transformer](https://preview.aclanthology.org/ingest-emnlp/2023.nlp4dh-1.18/) (Das et al., NLP4DH-IWCLUL 2023)
ACL