🎫 You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector

Omkar Patil Ondrej Biza Thomas Weng Karl Schmeckpeper Wil Thomason Xiaohan Zhang Robin Walters Nakul Gopalan Sebastian Castro Eric Rosen

Robotics and AI Institute (RAI)

📄 Paper 🧪 arxiv 💻 Code 🎥 Video

Video Presentation (🔊 Audio included)

We propose that the performance of a pretrained, frozen diffusion or flow matching policy can be improved by swapping the sampling of initial noise from the prior distribution with a well-chosen, constant initial noise input.

Abstract

What happens when a pretrained generative robot policy is provided a constant initial noise as input, rather than repeatedly sampling it from a Gaussian? We demonstrate that the performance of a pretrained, frozen diffusion or flow matching policy can be improved with respect to a downstream reward by swapping the sampling of initial noise from the prior distribution (typically isotropic Gaussian) with a well-chosen, constant initial noise input---a golden ticket. We propose a search method to find golden tickets using Monte-Carlo policy evaluation that keeps the pretrained policy frozen, does not train any new networks, and is applicable to all diffusion/flow matching policies (and therefore many VLAs). Our approach to policy improvement makes no assumptions beyond being able to inject initial noise into the policy and calculate (sparse) task rewards of episode rollouts, making it deployable with no additional infrastructure or models. Our method improves the performance of policies in 38 out of 43 tasks across simulated and real-world robot manipulation benchmarks, with relative improvements in success rate by up to 58% for some simulated tasks, and 60% within 50 search episodes for real-world tasks. We also show unique benefits of golden tickets for multi-task settings: the diversity of behaviors from different tickets naturally defines a Pareto frontier for balancing different objectives (e.g., speed, success rates); in VLAs, we find that a golden ticket optimized for one task can also boost performance in other related tasks. We release a codebase with pretrained policies and golden tickets for simulation benchmarks using VLAs, diffusion policies, and flow matching policies.

Methodology

The lottery ticket hypothesis for robot control proposes that the performance of a pretrained, frozen diffusion or flow matching policy can be improved by replacing the sampling of initial noise from a prior distribution with a well-chosen, constant initial noise input, called a golden ticket.

Overview of standard diffusion policy inference (left) versus our proposed approach of using golden tickets (right). Given a frozen, pretrained diffusion or flow matching policy, rather than sampling from a Gaussian every time an action needs to be computed, we use a constant, well-chosen initial noise vector, called a golden ticket. We find golden tickets improve policy performance across a range of observation inputs, model architectures, and embodiments.

Given a pretrained policy and a reward function and an environment for a new downstream task, we pose finding golden tickets as a search problem, where candidate initial noise vectors, which we call lottery tickets, are optimized to find the ticket that maximizes cumulative discounted expected rewards on the downstream task.

Our proposed method has several unique benefits:

The pretrained policy weights are kept frozen.
There is no additional network that is trained.
It can be applied to any diffusion or flow matching framework without additional training or test-time assumptions.

For each ticket, we run the policy in each environment, fixing the initial noise vector fed into the policy. We then calculate the average cumulative discounted rewards, providing an empirical estimate (i.e., Monte-Carlo estimate) of the value function for the ticket. After all tickets have been evaluated, we select and return the best performing ticket.

Hardware Experiments

We use Franka hardware to conduct all real-world experiments. We perform three tasks: picking a banana, pushing a cup, and picking a cube. The first two tasks (banana and cup) use pointcloud-based diffusion policies, whereas for the cup picking task we used an RGB-based diffusion policy. For each task (and corresponding base policy), we first evaluate the base policy using Gaussian noise (🎲). Then, we search for golden tickets (🎫), with the most number of rollouts being 150 episodes, and report the performance of the best performing ticket.

🍌 Banana picking task

For the banana picking pointcloud policy, we initially evaluate the base policy at a single location 10 times, where it successfully picks the banana 30% of the time. After searching for 10 tickets in the same location for 5 episodes each (50 episode rollouts total), we found a ticket that performed 100% in that location. We then evaluated that golden ticket and the base policy at 4 other locations on the table, 10 episodes each.

🎲 Gaussian Noise

Location 1 (❌ Grasp too high)

Location 2 (❌ Wrong orientation)

Location 3 (❌ Drops banana)

🎫 Golden Ticket

Location 1 (✅ Correct height)

Location 2 (✅ Correct orientation)

Location 3 (✅ Holds banana)

🥤 Push Cup

The goal of the push cup task is to push a cup to touch the red x on the table. For the cup pushing pointcloud policy, we evaluated the base policy 10 times and observed an average success rate of 40%. We searched for 10 tickets, with 5 rollouts, and found a ticket that achieved 100% success rate. When evaluated an additional 10 times, we found that it continued to achieve a 100% success rate. This constitutes our largest improvements on hardware, where the average success rate was increased by 60%.

🎲 Gaussian Noise

(❌ Too short)

(✅ Touches x)

(❌ Too short)

(✅ Touches x)

(❌ Too short)

(✅ Touches x)

🎫 Golden Ticket

(✅ Touches x)

📦 Pick cube

When using standard Gaussian noise, our block-picking policy successfully picks up the cube 80% of the time (40 out of 50). After searching for 6 tickets, each with 25 episode rollouts, 150 episodes in total, we find a golden ticket (Ticket 5) that succeeded 25 out of 25 times, and also a ticket (Ticket 6) that succeeded only 1 out of 25 times). When we then evaluate those two tickets an additional 50 times, Ticket 5 successfully picked the cube 98% of the time (49 out of 50), where Ticket 6 only succeeded 2 out of 50 times. We therefore show we can drive success rate from 80% to 98%, an 18% increase, using just 150 search episodes, or conversely steer the policy to miss with extreme reliability.

🎲 Gaussian Noise (40 out of 50, 80% S.R)

Episode 1 of 50

🎫 Golden Ticket 5 (49 out of 50, 98% S.R)

Episode 1 of 50

🎫 Golden Ticket 6 (2 out of 50, 4% S.R)

Episode 1 of 50

Pareto Frontier for Multiple Reward Functions

We use the frankasim MuJoCo environment to train a single-arm Franka robot to pick a cube, which randomly spawns in a 0.5 square meter region in front of the robot. We train a flow matching policy that takes as input the low-dimensional state of the cube and robot.

Various tickets (pink) for the frankasim pick policy, evaluated according to success rate and speed (determined by length of successful episodes). Higher is better success rate, left is faster time to success. Because lottery tickets exhibit extreme differences in policy performance, a Pareto frontier is defined by tickets that are further left/up than others (represented with golden ticket icons).

Ticket A: Accurate, but slower

Ticket B: Fast, but less accurate

We see a large variation in ticket performance on metrics measuring picking accuracy and speed. This diversity results in a small subset of golden tickets that define a Pareto frontier, meaning that they outperform all other regular tickets to the right and below them, and represent a balance between the two objectives. This is a unique property of golden tickets compared to other RL methods: rather than designing multiple reward functions that balance these objectives differently, we naturally find tickets that balance them without any reward tuning. This suggests a simple way to alter a robot's behavior to adapt between objectives online: collect the golden tickets on the Pareto frontier, and switch between them as desired.

Citation

@article{patil2026goldenticket,
  title={You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector},
  author={Patil, Omkar and Biza, Ondrej and Weng, Thomas and Schmeckpeper, Karl and Thomason, Wil and Zhang, Xiaohan and Walters, Robin and Gopalan, Nakul and Castro, Sebastian and Rosen, Eric},
  journal={},
  year={2026}
}