Pessimistic Backward Policy for GFlowNets
CoRR(2024)
Abstract
This paper studies Generative Flow Networks (GFlowNets), which learn to
sample objects proportionally to a given reward function through the trajectory
of state transitions. In this work, we observe that GFlowNets tend to
under-exploit the high-reward objects due to training on insufficient number of
trajectories, which may lead to a large gap between the estimated flow and the
(known) reward value. In response to this challenge, we propose a pessimistic
backward policy for GFlowNets (PBP-GFN), which maximizes the observed flow to
align closely with the true reward for the object. We extensively evaluate
PBP-GFN across eight benchmarks, including hyper-grid environment, bag
generation, structured set generation, molecular generation, and four RNA
sequence generation tasks. In particular, PBP-GFN enhances the discovery of
high-reward objects, maintains the diversity of the objects, and consistently
outperforms existing methods.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined