Pretrained generalist policies, such as vision-language-action (VLA) models, promise impressive zero-shot generalization in robot manipulation. However, their real-world performance tapers quickly under distribution shift, leading to decreased robustness and inconsistent instruction following abilities.
To address these challenges, we propose DreamSteer, a deploy-time steering framework to enhance pretrained VLAs without the need for finetuning on demonstration data collected in the target distribution. The key insight in DreamSteer is to leverage a latent world model and a general-purpose value function to steer pretrained VLA policies. During deployment, DreamSteer generates diverse action candidates, sourced from the VLA policy and a set of predefined motion primitives, and imagines the outcome of each action sequence by rolling it out within the latent world model.
By evaluating these predicted trajectories with the value model, DreamSteer identifies and executes the highest-scoring action, resulting in better instruction following and filtering out task-irrelevant behaviors. Across four real-world manipulation benchmarks of unseen objects, DreamSteer improves task success rates by 42.5 percentage points, from 23.75% to 66.25%, and increases instruction following accuracy by 17.5 percentage points, from 38.75% to 56.25%, compared to the base VLA. These results suggest that latent world models can steer VLA policies during deployment and provide an effective pathway for improving the reliability of generalist robot policies when finetuning may not be desired or feasible.
| Policy | Phone | Mustard | Tape | Eraser | Average |
|---|---|---|---|---|---|
| π0(k=1) | 4/20 | 3/20 | 6/20 | 6/20 | 23.75% |
| π0(k=5)+DreamSteer | 7/20 | 6/20 | 11/20 | 10/20 | 42.5% |
| π0(k=5+prim)+random | 0/20 | 0/20 | 0/20 | 0/20 | 0% |
| prim+DreamSteer | 0/20 | 0/20 | 0/20 | 0/20 | 0% |
| π0(k=5+prim)+DreamSteer | 12/20 | 11/20 | 16/20 | 14/20 | 66.25% |
| Policy | Sponge | Banana | Pencil case | Apple | Average |
|---|---|---|---|---|---|
| π0(k=1) | 8/20 | 9/20 | 6/20 | 8/20 | 38.75% |
| π0(k=5+prim)+DreamSteer | 14/20 | 13/20 | 9/20 | 9/20 | 56.25% |