2024
pdf
abs
HW-TSC at SemEval-2024 Task 9: Exploring Prompt Engineering Strategies for Brain Teaser Puzzles Through LLMs
Yinglu Li
|
Zhao Yanqing
|
Min Zhang
|
Yadong Deng
|
Aiju Geng
|
Xiaoqin Liu
|
Mengxin Ren
|
Yuang Li
|
Su Chang
|
Xiaofeng Zhao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Large Language Models (LLMs) have demonstrated impressive performance on many Natural Language Processing (NLP) tasks. However, their ability to solve more creative, lateral thinking puzzles remains relatively unexplored. In this work, we develop methods to enhance the lateral thinking and puzzle-solving capabilities of LLMs. We curate a dataset of word-type and sentence-type brain teasers requiring creative problem-solving abilities beyond commonsense reasoning. We first evaluate the zero-shot performance of models like GPT-3.5 and GPT-4 on this dataset. To improve their puzzle-solving skills, we employ prompting techniques like providing reasoning clues and chaining multiple examples to demonstrate the desired thinking process. We also fine-tune the state-of-the-art Mixtral 7x8b LLM on ourdataset. Our methods enable the models to achieve strong results, securing 2nd and 3rd places in the brain teaser task. Our work highlights the potential of LLMs in acquiring complex reasoning abilities with the appropriate training. The efficacy of our approaches opens up new research avenues into advancing lateral thinking and creative problem-solving with AI systems.
2023
pdf
abs
Empowering a Metric with LLM-assisted Named Entity Annotation: HW-TSC’s Submission to the WMT23 Metrics Shared Task
Zhanglin Wu
|
Yilun Liu
|
Min Zhang
|
Xiaofeng Zhao
|
Junhao Zhu
|
Ming Zhu
|
Xiaosong Qiao
|
Jingfei Zhang
|
Ma Miaomiao
|
Zhao Yanqing
|
Song Peng
|
Shimin Tao
|
Hao Yang
|
Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation
This paper presents the submission of Huawei Translation Service Center (HW-TSC) to the WMT23 metrics shared task, in which we submit two metrics: KG-BERTScore and HWTSC-EE-Metric. Among them, KG-BERTScore is our primary submission for the reference-free metric, which can provide both segment-level and system-level scoring. While HWTSC-EE-Metric is our primary submission for the reference-based metric, which can only provide system-level scoring. Overall, our metrics show relatively high correlations with MQM scores on the metrics tasks of previous years. Especially on system-level scoring tasks, our metrics achieve new state-of-the-art in many language pairs.
pdf
abs
HW-TSC’s Participation in the WMT 2023 Automatic Post Editing Shared Task
Jiawei Yu
|
Min Zhang
|
Zhao Yanqing
|
Xiaofeng Zhao
|
Yuang Li
|
Su Chang
|
Yinglu Li
|
Ma Miaomiao
|
Shimin Tao
|
Hao Yang
Proceedings of the Eighth Conference on Machine Translation
The paper presents the submission by HW-TSC in the WMT 2023 Automatic Post Editing (APE) shared task for the English-Marathi (En-Mr) language pair. Our method encompasses several key steps. First, we pre-train an APE model by utilizing synthetic APE data provided by the official task organizers. Then, we fine-tune the model by employing real APE data. For data augmentation, we incorporate candidate translations obtained from an external Machine Translation (MT) system. Furthermore, we integrate the En-Mr parallel corpus from the Flores-200 dataset into our training data. To address the overfitting issue, we employ R-Drop during the training phase. Given that APE systems tend to exhibit a tendency of ‘over-correction’, we employ a sentence-level Quality Estimation (QE) system to select the final output, deciding between the original translation and the corresponding output generated by the APE model. Our experiments demonstrate that pre-trained APE models are effective when being fine-tuned with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. Our approach improves the TER and BLEU scores on the development set by -2.42 and +3.76 points, respectively.
pdf
abs
Leveraging Multilingual Knowledge Graph to Boost Domain-specific Entity Translation of ChatGPT
Min Zhang
|
Limin Liu
|
Zhao Yanqing
|
Xiaosong Qiao
|
Su Chang
|
Xiaofeng Zhao
|
Junhao Zhu
|
Ming Zhu
|
Song Peng
|
Yinglu Li
|
Yilun Liu
|
Wenbing Ma
|
Mengyao Piao
|
Shimin Tao
|
Hao Yang
|
Yanfei Jiang
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
Recently, ChatGPT has shown promising results for Machine Translation (MT) in general domains and is becoming a new paradigm for translation. In this paper, we focus on how to apply ChatGPT to domain-specific translation and propose to leverage Multilingual Knowledge Graph (MKG) to help ChatGPT improve the domain entity translation quality. To achieve this, we extract the bilingual entity pairs from MKG for the domain entities that are recognized from source sentences. We then introduce these pairs into translation prompts, instructing ChatGPT to use the correct translations of the domain entities. To evaluate the novel MKG method for ChatGPT, we conduct comparative experiments on three Chinese-English (zh-en) test datasets constructed from three specific domains, of which one domain is from biomedical science, and the other two are from the Information and Communications Technology (ICT) industry — Visible Light Communication (VLC) and wireless domains. Experimental results demonstrate that both the overall translation quality of ChatGPT (+6.21, +3.13 and +11.25 in BLEU scores) and the translation accuracy of domain entities (+43.2%, +30.2% and +37.9% absolute points) are significantly improved with MKG on the three test datasets.