Limitations of Agents Simulated by Predictive Models
CoRR(2024)
摘要
There is increasing focus on adapting predictive models into agent-like
systems, most notably AI assistants based on language models. We outline two
structural reasons for why these models can fail when turned into agents.
First, we discuss auto-suggestive delusions. Prior work has shown theoretically
that models fail to imitate agents that generated the training data if the
agents relied on hidden observations: the hidden observations act as
confounding variables, and the models treat actions they generate as evidence
for nonexistent observations. Second, we introduce and formally study a
related, novel limitation: predictor-policy incoherence. When a model generates
a sequence of actions, the model's implicit prediction of the policy that
generated those actions can serve as a confounding variable. The result is that
models choose actions as if they expect future actions to be suboptimal,
causing them to be overly conservative. We show that both of those failures are
fixed by including a feedback loop from the environment, that is, re-training
the models on their own actions. We give simple demonstrations of both
limitations using Decision Transformers and confirm that empirical results
agree with our conceptual and formal analysis. Our treatment provides a
unifying view of those failure modes, and informs the question of why
fine-tuning offline learned policies with online learning makes them more
effective.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要