Ayako Sato


2024

Lexical simplification (LS) is a process of replacing complex words with simpler alternatives to help readers understand sentences seamlessly. This process is divided into two primary subtasks: assessing word complexities and replacing high-complexity words with simpler alternatives. Employing task-specific supervised data to train models is a prevalent strategy for addressing these subtasks. However, such approach cannot be employed for low-resource languages. Therefore, this paper introduces a multilingual LS pipeline system that does not rely on supervised data. Specifically, we have developed systems based on GPT-4 for each subtask. Our systems demonstrated top-class performance on both tasks in many languages. The results indicate that GPT-4 can effectively assess lexical complexity and simplify complex words in a multilingual context with high quality.
In machine translation quality estimation (QE), translation quality is evaluated automatically without the need for reference translations. This paper describes our contribution to the sentence-level subtask of Task 1 at the Ninth Machine Translation Conference (WMT24), which predicts quality scores for neural MT outputs without reference translations. We fine-tune GPT-4o mini, a large-scale language model (LLM), with limited data for QE.We report results for the direct assessment (DA) method for four language pairs: English-Gujarati (En-Gu), English-Hindi (En-Hi), English-Tamil (En-Ta), and English-Telugu (En-Te).Experiments under zero-shot, few-shot prompting, and fine-tuning settings revealed significantly low performance in the zero-shot, while fine-tuning achieved accuracy comparable to last year’s best scores. Our system demonstrated the effectiveness of this approach in low-resource language QE, securing 1st place in both En-Gu and En-Hi, and 4th place in En-Ta and En-Te.