@inproceedings{monsur-etal-2023-synthnid,
    title = "{S}ynth{NID}: Synthetic Data to Improve End-to-end {B}angla Document Key Information Extraction",
    author = "Monsur, Syed Mostofa  and
      Kabir, Shariar  and
      Chowdhury, Sakib",
    editor = "Alam, Firoj  and
      Kar, Sudipta  and
      Chowdhury, Shammur Absar  and
      Sadeque, Farig  and
      Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.banglalp-1.13/",
    doi = "10.18653/v1/2023.banglalp-1.13",
    pages = "117--123",
    abstract = "End-to-end Document Key Information Extraction models require a lot of compute and labeled data to perform well on real datasets. This is particularly challenging for low-resource languages like Bangla where domain-specific multimodal document datasets are scarcely available. In this paper, we have introduced SynthNID, a system to generate domain-specific document image data for training OCR-less end-to-end Key Information Extraction systems. We show the generated data improves the performance of the extraction model on real datasets and the system is easily extendable to generate other types of scanned documents for a wide range of document understanding tasks. The code for generating synthetic data is available at https://github.com/dv66/synthnid"
}Markdown (Informal)
[SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction](https://preview.aclanthology.org/ingest-emnlp/2023.banglalp-1.13/) (Monsur et al., BanglaLP 2023)
ACL