Shutan Huang
2024
An Unsupervised Framework for Adaptive Context-aware Simplified-Traditional Chinese Character Conversion
Wei Li
|
Shutan Huang
|
Yanqiu Shao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Traditional Chinese character is an important carrier of Chinese culture, and is still actively used in many areas. Automatic conversion between traditional and simplified Chinese characters can help modern people understand traditional culture and facilitate communication among different regions. Previous conversion methods rely on rule-based mapping or shallow feature-based machine learning models, which struggle to convert simplified characters with different origins and constructing training data is costly. In this study, we propose an unsupervised adaptive context-aware conversion model that learns to convert between simplified and traditional Chinese characters under a denoising auto-encoder framework requiring no labeled data. Our model includes a Latent Generative Adversarial Encoder that transforms vectors to a latent space with generative adversarial network, which adds noise as an inevitable side effect, Based on which a Context-aware Semantic Reconstruction Decoder restores the original input while considering a broader range of context with a pretrained language model. Additionally, we propose to apply early exit mechanism during inference to reduce the computation complexity and improve the generalization ability. To test the effectiveness of our model, we construct a high quality test dataset with simplified-traditional Chinese character text pairs. Experiment results and extensive analysis demonstrate that our model outperforms strong unsupervised baselines and yields better conversion result for one-to-many cases.
2023
CCL23-Eval 任务3系统报告:基于多任务pipeline策略的汉语框架语义解析(System Report for CCL23-Eval Task 3: Chinese Frame Semantic Parsing Based on Multi task Pipeline Strategy)
Shutan Huang (黄舒坦)
|
Qiuyan Shao (邵艳秋)
|
Wei Li (李炜)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“本论文为2023届CCL汉语框架语义解析评测任务提供了实现方法。针对汉语框架语义解析任务是多任务的特点,考虑到各子任务之间具有较强的时序性和关联性,方法采用了多任务pipeline策略的框架结构,主要由框架分类,论元识别,角色分类三个子模块组成,分别对应框架识别,论元范围识别,论元角色识别三个子任务。本文将框架识别和论元角色识别任务建模为文本分类任务,将论元范围识别任务建模为实体识别任务。考虑到各子任务之间具有较强的时序性和关联性,方法在每个模块均充分考虑了如何利用完成其他子任务时所抽取到的特征和信息。比如在进行角色分类时,利用了框架分类模块识别出的框架类别,以及论元识别模块识别出的论元范围。考虑到目标词及其上下文语境的重要性,本文使用预训练语言模型进行finetune。观察到模型的表现不稳定,训练时使用了对抗训练等策略提升模型性能。最终A榜分数值达到71.91,B榜分数值达到70.60,排名第2,验证了本文方法的有效性。”
Search