Fengyi Yang


2025

pdf bib
Low-Resource Language Expansion and Translation Capacity Enhancement for LLM: A Study on the Uyghur
Kaiwen Lu | Yating Yang | Fengyi Yang | Rui Dong | Bo Ma | Aihetamujiang Aihemaiti | Abibilla Atawulla | Lei Wang | Xi Zhou
Proceedings of the 31st International Conference on Computational Linguistics

Although large language models have significantly advanced natural language generation, their potential in low-resource machine translation has not yet been fully explored, especially for languages that translation models have not been trained on. In this study, we provide a detailed demonstration of how to efficiently expand low-resource languages for large language models and significantly enhance the model’s translation ability, using Uyghur as an example. The process involves four stages: collecting and pre-processing monolingual data, conducting continuous pre-training with extensive monolingual data, fine-tuning with less parallel corpora using translation supervision, and proposing a direct preference optimization based on translation self-evolution (DPOSE) on this basis. Extensive experiments have shown that our strategy effectively expands the low-resource languages supported by large language models and significantly enhances the model’s translation ability in Uyghur with less parallel data. Our research provides detailed insights for expanding other low-resource languages into large language models.

pdf bib
Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event Forecasting
Rong Ma | Lei Wang | Yating Yang | Bo Ma | Rui Dong | Fengyi Yang | Ahtamjan Ahmat | Kaiwen Lu | Xinyue Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Event forecasting requires modeling historical event data to predict future events, and achieving accurate predictions depends on effectively capturing the relevant historical information that aids forecasting. Most existing methods focus on entities and structural dependencies to capture historical clues but often overlook implicitly relevant information. This limitation arises from overlooking event semantics and deeper factual associations that are not explicitly connected in the graph structure but are nonetheless critical for accurate forecasting. To address this, we propose a dual-criteria constraint strategy that leverages event semantics for relevance modeling and incorporates a self-supervised semantic filter based on factual event associations to capture implicitly relevant historical information. Building on this strategy, our method, termed ITHI (Integrating Three types of Historical Information), combines sequential event information, periodically repeated event information, and relevant historical information to achieve context-aware event forecasting. We evaluated the proposed ITHI method on three public benchmark datasets, achieving state-of-the-art performance and significantly outperforming existing approaches. Additionally, we validated its effectiveness on two structured temporal knowledge graph forecasting dataset.