@inproceedings{levandovsky-2025-deep,
    title = "Deep Reinforcement Learning of {LLM}s using {RLHF}",
    author = "Levandovsky, Enoch",
    editor = "Whetten, Ryan  and
      Sucal, Virgile  and
      Ngo, Anh  and
      Chalamalasetti, Kranti  and
      Inoue, Koji  and
      Cimino, Gaetano  and
      Yang, Zachary  and
      Zenimoto, Yuki  and
      Rodriguez, Ricardo",
    booktitle = "Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems",
    month = aug,
    year = "2025",
    address = "Avignon, France",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.yrrsds-1.2/",
    pages = "4--5",
    abstract = "My main research interests lies in the application of Reinforcement Learning (RL) alignment of LLMs in human robot dialogue. More specifically, my latest research aims to use RL alignment as an efficient training regime to train a newly initialized tiny LM to behave like a toddler. Previous research expresses the difficulty of building a robust tiny LM with an educated adult level understanding. Our hypothesis is that the cognitive barrier to train a tiny LM to at-least behave as a child is achievable with a very small number of parameters especially if training efficiently using RL LLM training regime. My interests also extend to apply RL to LLM training for dialogue management and planning."
}Markdown (Informal)
[Deep Reinforcement Learning of LLMs using RLHF](https://preview.aclanthology.org/ingest-emnlp/2025.yrrsds-1.2/) (Levandovsky, YRRSDS 2025)
ACL
- Enoch Levandovsky. 2025. Deep Reinforcement Learning of LLMs using RLHF. In Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems, pages 4–5, Avignon, France. Association for Computational Linguistics.