Offline RL for language models is indeed a promising direction to explore. It's worth noting that Sergey, an expert in the field, has expressed concerns about the feasibility of online RL with language models. This reminds me how brilliant of the RLHF approach is
1:44 Could you be more specific about prompt engineering? It seems an highly interesting topic about the internal probabilistic structures of large models explains how they are exploited by it or might be even edited.