2024
pdf
bib
abs
基于知识蒸馏的低频词翻译优化策略(Knowledge Distillation-Based Optimization Strategy for Low-Frequency Word Translation in Neural Machine)
Yifan Guo (郭逸帆)
|
Hongying Zan (昝红英)
|
Ziyue Yan (阎子悦)
|
Hongfei Xu (许鸿飞)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“神经机器翻译通常需要大量的平行语料库才能达到良好的翻译效果。而在不同的平行语料库中,均存在词频分布不平衡的问题,这可能导致模型在学习过程中表现出不同的偏差。这些模型倾向于学习高频词汇,而忽略了低频词汇所携带的关键语义信息。忽略的这些低频词汇也包含重要的翻译信息,可能会对翻译质量产生不利影响。目前的方法通常是训练一个双语模型,然后根据频率为词汇分配不同的权重,通过增加低频词的权重来提高低频词的翻译效果。在本文中,我们的目标是提高那些有意义但频率相对较低的词汇的翻译效果。本文提出使用知识蒸馏的方法来提高低频词的翻译效果,训练在低频词上翻译效果更好的模型,将其作为教师模型指导学生模型学习低频词翻译。进而提出一个更加稳定的双教师蒸馏模型,进一步保证高频的性能,使得模型在多个任务上均获得了稳定的提升。本文的单教师蒸馏模型在英语→ 德语任务上相较于SOTA进一步取得了0.64的BLEU提升,双教师蒸馏模型在汉语→ 英语任务上相较于SOTA进一步取得了0.31的BLEU提升,在英语→ 德语、英语→ 捷克语和英语→法语的翻译任务上相较于基线低频词翻译效果,在保证高频词翻译效果不变化的前提下,分别取得了1.24、0.47、0.87的BLEU提升。”
pdf
bib
abs
Dual-teacher Knowledge Distillation for Low-frequency Word Translation
Yifan Guo
|
Hongying Zan
|
Hongfei Xu
Findings of the Association for Computational Linguistics: EMNLP 2024
Neural Machine Translation (NMT) models are trained on parallel corpora with unbalanced word frequency distribution. As a result, NMT models are likely to prefer high-frequency words than low-frequency ones despite low-frequency word may carry the crucial semantic information, which may hamper the translation quality once they are neglected. The objective of this study is to enhance the translation of meaningful but low-frequency words. Our general idea is to optimize the translation of low-frequency words through knowledge distillation. Specifically, we employ a low-frequency teacher model that excels in translating low-frequency words to guide the learning of the student model. To remain the translation quality of high-frequency words, we further introduce a dual-teacher distillation framework, leveraging both the low-frequency and high-frequency teacher models to guide the student model’s training. Our single-teacher distillation method already achieves a +0.64 BLEU improvements over the state-of-the-art method on the WMT 16 English-to-German translation task on the low-frequency test set. While our dual-teacher framework leads to +0.87, +1.24, +0.47, +0.87 and +0.86 BLEU improvements on the IWSLT 14 German-to-English, WMT 16 English-to-German, WMT 15 English-to-Czech, WMT 14 English-to-French and WMT 18 Chinese-to-English tasks respectively compared to the baseline, while maintaining the translation performance of high-frequency words.
2022
pdf
bib
abs
Textstar: a Fast and Lightweight Graph-Based Algorithm for Extractive Summarization and Keyphrase Extraction
David Brock
|
Ali Khan
|
Tam Doan
|
Alicia Lin
|
Yifan Guo
|
Paul Tarau
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association
We introduce Textstar, a graph-based summarization and keyphrase extraction system that builds a document graph using only lemmatization and POS tagging. The document graph aggregates connections between lemma and sentence identifier nodes. Consecutive lemmas in each sentence, as well as consecutive sentences themselves, are connected in rings to form a ring of rings representing the document. We iteratively apply a centrality algorithm of our choice to the document graph and trim the lowest ranked nodes at each step. After the desired number of remaining sentences and lemmas is reached, we extract the sentences as the summary, and the remaining lemmas are aggregated into keyphrases using their context. Our algorithm is efficient enough to one-shot process large document graphs without any training, and empirical evaluation on several benchmarks indicates that our performance is higher than most other graph based algorithms.