@inproceedings{zeng-2024-leveraging,
    title = "Leveraging Large Language Models for Code-Mixed Data Augmentation in Sentiment Analysis",
    author = "Zeng, Linda",
    editor = "Hale, James  and
      Chawla, Kushal  and
      Garg, Muskan",
    booktitle = "Proceedings of the Second Workshop on Social Influence in Conversations (SICon 2024)",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.sicon-1.6/",
    doi = "10.18653/v1/2024.sicon-1.6",
    pages = "85--101",
    abstract = "Code-mixing (CM), where speakers blend languages within a single expression, is prevalent in multilingual societies but poses challenges for natural language processing due to its complexity and limited data. We propose using a large language model to generate synthetic CM data, which is then used to enhance the performance of task-specific models for CM sentiment analysis. Our results show that in Spanish-English, synthetic data improved the F1 score by 9.32{\%}, outperforming previous augmentation techniques. However, in Malayalam-English, synthetic data only helped when the baseline was low; with strong natural data, additional synthetic data offered little benefit. Human evaluation confirmed that this approach is a simple, cost-effective way to generate natural-sounding CM sentences, particularly beneficial for low baselines. Our findings suggest that few-shot prompting of large language models is a promising method for CM data augmentation and has significant impact on improving sentiment analysis, an important element in the development of social influence systems."
}Markdown (Informal)
[Leveraging Large Language Models for Code-Mixed Data Augmentation in Sentiment Analysis](https://preview.aclanthology.org/ingest-emnlp/2024.sicon-1.6/) (Zeng, SICon 2024)
ACL