融合双重注意力机制的缅甸语图像文本识别方法(Burmese image text recognition method with dual attention mechanism)

Fengxiao Wang (王奉孝), Cunli Mao (毛存礼), Zhengtao Yu (余正涛), Shengxiang Gao (高盛祥), Huang Yuxin (黄于欣), Fuhao Liu (刘福浩)


Abstract
“由于缅甸语字符具有独特的语言编码结构以及字符组合规则,现有图像文本识别方法在缅甸语图像识别任务中无法充分关注文字边缘的特征,会导致缅甸语字符上下标丢失的问题。因此,本文基于Transformer框架的图像文本识别方法做出改进,提出一种融合通道和空间注意力机制的视觉关注模块,旨在捕获像素级成对关系和通道依赖关系,降低缅甸语图像中噪声干扰从而获得语义更完整的特征图。此外,在解码过程中,将基于多头注意力的解码单元组合为解码器,用于将特征序列转化为缅甸语文字。实验结果表明,该方法在自构的缅甸语图像文本识别数据集上相比Transformer识别准确率提高0.5%,达到95.3%。”
Anthology ID:
2022.ccl-1.32
Volume:
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Nanchang, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
355–365
Language:
Chinese
URL:
https://aclanthology.org/2022.ccl-1.32
DOI:
Bibkey:
Cite (ACL):
Fengxiao Wang, Cunli Mao, Zhengtao Yu, Shengxiang Gao, Huang Yuxin, and Fuhao Liu. 2022. 融合双重注意力机制的缅甸语图像文本识别方法(Burmese image text recognition method with dual attention mechanism). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 355–365, Nanchang, China. Chinese Information Processing Society of China.
Cite (Informal):
融合双重注意力机制的缅甸语图像文本识别方法(Burmese image text recognition method with dual attention mechanism) (Wang et al., CCL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.ccl-1.32.pdf