Imon Mukherjee
2023
Combating Hallucination and Misinformation: Factual Information Generation with Tokenized Generative Transformer
Sourav Das
|
Sanjay Chatterji
|
Imon Mukherjee
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
Large language models have gained a meteoric rise recently. With the prominence of LLMs, hallucination and misinformation generation have become a severity too. To combat this issue, we propose a contextual topic modeling approach called Co-LDA for generative transformer. It is based on Latent Dirichlet Allocation and is designed for accurate sentence-level information generation. This method extracts cohesive topics from COVID-19 research literature, grouping them into relevant categories. These contextually rich topic words serve as masked tokens in our proposed Tokenized Generative Transformer, a modified Generative Pre-Trained Transformer for generating accurate information in any designated topics. Our approach addresses micro hallucination and incorrect information issues in experimentation with the LLMs. We also introduce a Perplexity-Similarity Score system to measure semantic similarity between generated and original documents, offering accuracy and authenticity for generated texts. Evaluation of benchmark datasets, including question answering, language understanding, and language similarity demonstrates the effectiveness of our text generation method, surpassing some state-of-the-art transformer models.
2022
A custom CNN model for detection of rice disease under complex environment
Chiranjit Pal
|
Sanjoy Pratihar
|
Imon Mukherjee
Proceedings of the First Workshop on NLP in Agriculture and Livestock Management
The work in this paper designs an image-based rice disease detection framework that takes rice plant image as input and identifies the presence of BrownSpot disease in the image fed into the system. A CNN-based disease detection scheme performs the binary classification task on our custom dataset containing 2223 images of healthy and unhealthy classes under complex environments. Experimental results show that our system is able to achieve consistently satisfactory results in performing disease detection tasks. Furthermore, the CNN disease detection model compares with state-of-the-art works and procures an accuracy of 96.8%.
Search