Peng Sun
2024
ReFT: Reasoning with Reinforced Fine-Tuning
Luong Trung
|
Xinbo Zhang
|
Zhanming Jie
|
Peng Sun
|
Xiaoran Jin
|
Hang Li
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This approach does not show sufficiently strong generalization ability, however, because the training only relies on the given CoT data. In math problem-solving, for example, there is usually only one annotated reasoning path for each question in the training data. Intuitively, it would be better for the algorithm to learn from multiple annotated reasoning paths given a question. To address this issue, we propose a simple yet effective approach called Reinforced Fine-Tuning (ReFT) to enhance the generalizability of learning LLMs for reasoning, with math problem-solving as an example. ReFT first warmups the model with SFT, and then employs on-line reinforcement learning, specifically the PPO algorithm in this paper, to further fine-tune the model, where an abundance of reasoning paths are automatically sampled given the question and the rewards are naturally derived from the ground-truth answers. Extensive experiments on GSM8K, MathQA, and SVAMP datasets show that ReFT significantly outperforms SFT, and the performance can be potentially further boosted by combining inference-time strategies such as majority voting and re-ranking. Note that ReFT obtains the improvement by learning from the same training questions as SFT, without relying on extra or augmented training questions. This indicates a superior generalization ability for ReFT.
2022
EmojiCloud: a Tool for Emoji Cloud Visualization
Yunhe Feng
|
Cheng Guo
|
Bingbing Wen
|
Peng Sun
|
Yufei Yue
|
Dingwen Tao
Proceedings of the Fifth International Workshop on Emoji Understanding and Applications in Social Media
This paper proposes EmojiCloud, an open-source Python-based emoji cloud visualization tool, to generate a quick and straightforward understanding of emojis from the perspective of frequency and importance. EmojiCloud is flexible enough to support diverse drawing shapes, such as rectangles, ellipses, and image masked canvases. We also follow inclusive and personalized design principles to cover the unique emoji designs from seven emoji vendors (e.g., Twitter, Apple, and Windows) and allow users to customize plotted emojis and background colors. We hope EmojiCloud can benefit the whole emoji community due to its flexibility, inclusiveness, and customizability.
Search
Co-authors
- Yunhe Feng 1
- Cheng Guo 1
- Bingbing Wen 1
- Yufei Yue 1
- Dingwen Tao 1
- show all...