ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

¹Robotics and AI Institute, ²University of Texas at Austin, ³Sony AI

^* The work is done while Zifan was an intern at RAI. ^† Equal advising.

Abstract

Learning generalizable and robust behavior cloning policies requires large volumes of high-quality robotics data. While human demonstrations (e.g., through teleoperation) serve as the standard source for expert behaviors, acquiring such data at scale in the real world is prohibitively expensive. This paper introduces ExpertGen, a framework that automates expert policy learning in simulation to enable scalable sim-to-real transfer. ExpertGen first initializes a behavior prior using a diffusion policy trained on imperfect demonstrations, which may be synthesized by large language models or provided by humans. Reinforcement learning is then used to steer this prior toward high task success by optimizing the diffusion model’s initial noise while keep original policy frozen. By keeping the pretrained diffusion policy frozen, ExpertGen regularizes exploration to remain within safe, human-like behavior manifolds, while also enabling effective learning with only sparse rewards.

Empirical evaluations on challenging manipulation benchmarks demonstrate that ExpertGen reliably produces high-quality expert policies with no reward engineering. On industrial assembly tasks, ExpertGen achieves a 90.5% overall success rate, while on long-horizon manipulation tasks it attains 85% overall success, outperforming all baseline methods. The resulting policies exhibit dexterous control and remain robust across diverse initial configurations and failure states. To validate sim-to-real transfer , the learned state-based expert policies are further distilled into visuomotor policies via DAgger and successfully deployed on real robotic hardware.

Zero-shot Sim-to-Real Transfer

Baselines

ExpertGen

Robust recovery from banana dropping

Diffusion Policy

Fails to recover from banana dropping

Residual RL

Fails to recover from banana dropping

ExpertGen

Diffusion Policy

Residual RL

ExpertGen

Close-loop control that pushes

the pear to the exact goal position

Diffusion Policy

Open-loop control with the pear

oscillating around the goal position

Residual RL

Fails to push the pear to the goal position

ExpertGen (Simulation) - AnyTask

ExpertGen (Simulation) - AutoMate

BibTeX

@article{xu2026expertgen, title={ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors}, author={Xu, Zifan and Gong, Ran and Minniti, Maria Vittoria and Gundogdu, Ahmet Salih and Rosen, Eric and Sivakumar, Kausik and Yan, Riedana and Wang, Zixing and Deng, Di and Stone, Peter and Zhang, Xiaohan and Schmeckpeper, Karl}, journal={arXiv preprint arXiv:2603.15956}, year={2026} }