hub / github.com/real-stanford/diffusion_policy

github.com/real-stanford/diffusion_policy @main

repository ↗ · DeepWiki ↗ · + Follow

1,560 symbols 5,389 edges 178 files 346 documented · 22% ● updated 18mo ago★ 4,34291 open issues

README

Diffusion Policy

[Project page] [Paper] [Data] [Colab (state)] [Colab (vision)]

Cheng Chi¹, Siyuan Feng², Yilun Du³, Zhenjia Xu¹, Eric Cousineau², Benjamin Burchfiel², Shuran Song¹

¹Columbia University, ²Toyota Research Institute, ³MIT

drawing drawing

🛝 Try it out!

Our self-contained Google Colab notebooks is the easiest way to play with Diffusion Policy. We provide separate notebooks for state-based environment and vision-based environment.

🧾 Checkout our experiment logs!

For each experiment used to generate Table I,II and IV in the paper, we provide: 1. A config.yaml that contains all parameters needed to reproduce the experiment. 2. Detailed training/eval logs.json.txt for every training step. 3. Checkpoints for the best epoch=*-test_mean_score=*.ckpt and last latest.ckpt epoch of each run.

Experiment logs are hosted on our website as nested directories in format: https://diffusion-policy.cs.columbia.edu/data/experiments/<image|low_dim>/<task>/<method>/

Within each experiment directory you may find:

.
├── config.yaml
├── metrics
│   └── logs.json.txt
├── train_0
│   ├── checkpoints
│   │   ├── epoch=0300-test_mean_score=1.000.ckpt
│   │   └── latest.ckpt
│   └── logs.json.txt
├── train_1
│   ├── checkpoints
│   │   ├── epoch=0250-test_mean_score=1.000.ckpt
│   │   └── latest.ckpt
│   └── logs.json.txt
└── train_2
    ├── checkpoints
    │   ├── epoch=0250-test_mean_score=1.000.ckpt
    │   └── latest.ckpt
    └── logs.json.txt

The metrics/logs.json.txt file aggregates evaluation metrics from all 3 training runs every 50 epochs using multirun_metrics.py. The numbers reported in the paper correspond to max and k_min_train_loss aggregation keys.

To download all files in a subdirectory, use:

$ wget --recursive --no-parent --no-host-directories --relative --reject="index.html*" https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/square_ph/diffusion_policy_cnn/

🛠️ Installation

🖥️ Simulation

To reproduce our simulation benchmark results, install our conda environment on a Linux machine with Nvidia GPU. On Ubuntu 20.04 you need to install the following apt packages for mujoco:

$ sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf

We recommend Mambaforge instead of the standard anaconda distribution for faster installation:

$ mamba env create -f conda_environment.yaml

but you can use conda as well:

$ conda env create -f conda_environment.yaml

The conda_environment_macos.yaml file is only for development on MacOS and does not have full support for benchmarks.

🦾 Real Robot

Hardware (for Push-T): * 1x UR5-CB3 or UR5e (RTDE Interface is required) * 2x RealSense D415 * 1x 3Dconnexion SpaceMouse (for teleop) * 1x Millibar Robotics Manual Tool Changer (only need robot side) * 1x 3D printed End effector * 1x 3D printed T-block * USB-C cables and screws for RealSense

Software: * Ubuntu 20.04.3 (tested) * Mujoco dependencies: sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf * RealSense SDK * Spacemouse dependencies: sudo apt install libspnav-dev spacenavd; sudo systemctl start spacenavd * Conda environment mamba env create -f conda_environment_real.yaml

🖥️ Reproducing Simulation Benchmark Results

Download Training Data

Under the repo root, create data subdirectory:

[diffusion_policy]$ mkdir data && cd data

Download the corresponding zip file from https://diffusion-policy.cs.columbia.edu/data/training/

[data]$ wget https://diffusion-policy.cs.columbia.edu/data/training/pusht.zip

Extract training data:

[data]$ unzip pusht.zip && rm -f pusht.zip && cd ..

Grab config file for the corresponding experiment:

[diffusion_policy]$ wget -O image_pusht_diffusion_policy_cnn.yaml https://diffusion-policy.cs.columbia.edu/data/experiments/image/pusht/diffusion_policy_cnn/config.yaml

Running for a single seed

Activate conda environment and login to wandb (if you haven't already).

[diffusion_policy]$ conda activate robodiff
(robodiff)[diffusion_policy]$ wandb login

Launch training with seed 42 on GPU 0.

(robodiff)[diffusion_policy]$ python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml training.seed=42 training.device=cuda:0 hydra.run.dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}'

This will create a directory in format data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name> where configs, logs and checkpoints are written to. The policy will be evaluated every 50 epochs with the success rate logged as test/mean_score on wandb, as well as videos for some rollouts.

(robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image -I wandb
data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image
├── checkpoints
│   ├── epoch=0000-test_mean_score=0.134.ckpt
│   └── latest.ckpt
├── .hydra
│   ├── config.yaml
│   ├── hydra.yaml
│   └── overrides.yaml
├── logs.json.txt
├── media
│   ├── 2k5u6wli.mp4
│   ├── 2kvovxms.mp4
│   ├── 2pxd9f6b.mp4
│   ├── 2q5gjt5f.mp4
│   ├── 2sawbf6m.mp4
│   └── 538ubl79.mp4
└── train.log

3 directories, 13 files

Running for multiple seeds

Launch local ray cluster. For large scale experiments, you might want to setup an AWS cluster with autoscaling. All other commands remain the same.

(robodiff)[diffusion_policy]$ export CUDA_VISIBLE_DEVICES=0,1,2  # select GPUs to be managed by the ray cluster
(robodiff)[diffusion_policy]$ ray start --head --num-gpus=3

Launch a ray client which will start 3 training workers (3 seeds) and 1 metrics monitor worker.

(robodiff)[diffusion_policy]$ python ray_train_multirun.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml --seeds=42,43,44 --monitor_key=test/mean_score -- multi_run.run_dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}' multi_run.wandb_name_base='${now:%Y.%m.%d-%H.%M.%S}_${name}_${task_name}'

In addition to the wandb log written by each training worker individually, the metrics monitor worker will log to wandb project diffusion_policy_metrics for the metrics aggregated from all 3 training runs. Local config, logs and checkpoints will be written to data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name> in a directory structure identical to our training logs:

(robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image -I 'wandb|media'
data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image
├── config.yaml
├── metrics
│   ├── logs.json.txt
│   ├── metrics.json
│   └── metrics.log
├── train_0
│   ├── checkpoints
│   │   ├── epoch=0000-test_mean_score=0.174.ckpt
│   │   └── latest.ckpt
│   ├── logs.json.txt
│   └── train.log
├── train_1
│   ├── checkpoints
│   │   ├── epoch=0000-test_mean_score=0.131.ckpt
│   │   └── latest.ckpt
│   ├── logs.json.txt
│   └── train.log
└── train_2
    ├── checkpoints
    │   ├── epoch=0000-test_mean_score=0.105.ckpt
    │   └── latest.ckpt
    ├── logs.json.txt
    └── train.log

7 directories, 16 files

🆕 Evaluate Pre-trained Checkpoints

Download a checkpoint from the published training log folders, such as https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/pusht/diffusion_policy_cnn/train_0/checkpoints/epoch=0550-test_mean_score=0.969.ckpt.

Run the evaluation script:

(robodiff)[diffusion_policy]$ python eval.py --checkpoint data/0550-test_mean_score=0.969.ckpt --output_dir data/pusht_eval_output --device cuda:0

This will generate the following directory structure:

(robodiff)[diffusion_policy]$ tree data/pusht_eval_output
data/pusht_eval_output
├── eval_log.json
└── media
    ├── 1fxtno84.mp4
    ├── 224l7jqd.mp4
    ├── 2fo4btlf.mp4
    ├── 2in4cn7a.mp4
    ├── 34b3o2qq.mp4
    └── 3p7jqn32.mp4

1 directory, 7 files

eval_log.json contains metrics that is logged to wandb during training:

(robodiff)[diffusion_policy]$ cat data/pusht_eval_output/eval_log.json
{
  "test/mean_score": 0.9150393806777066,
  "test/sim_max_reward_4300000": 1.0,
  "test/sim_max_reward_4300001": 0.9872969750774386,
...
  "train/sim_video_1": "data/pusht_eval_output//media/2fo4btlf.mp4"
}

🦾 Demo, Training and Eval on a Real Robot

Make sure your UR5 robot is running and accepting command from its network interface (emergency stop button within reach at all time), your RealSense cameras plugged in to your workstation (tested with realsense-viewer) and your SpaceMouse connected with the spacenavd daemon running (verify with systemctl status spacenavd).

Start the demonstration collection script. Press "C" to start recording. Use SpaceMouse to move the robot. Press "S" to stop recording.

(robodiff)[diffusion_policy]$ python demo_real_robot.py -o data/demo_pusht_real --robot_ip 192.168.0.204

This should result in a demonstration dataset in data/demo_pusht_real with in the same structure as our example real Push-T training dataset.

To train a Diffusion Policy, launch training with config:

(robodiff)[diffusion_policy]$ python train.py --config-name=train_diffusion_unet_real_image_workspace task.dataset_path=data/demo_pusht_real

Edit diffusion_policy/config/task/real_pusht_image.yaml if your camera setup is different.

Assuming the training has finished and you have a checkpoint at data/outputs/blah/checkpoints/latest.ckpt, launch the evaluation script with:

python eval_real_robot.py -i data/outputs/blah/checkpoints/latest.ckpt -o data/eval_pusht_real --robot_ip 192.168.0.204

Press "C" to start evaluation (handing control over to the policy). Press "S" to stop the current episode.

🗺️ Codebase Tutorial

This codebase is structured under the requirement that: 1. implementing N tasks and M methods will only require O(N+M) amount of code instead of O(N*M) 2. while retaining maximum flexibility.

To achieve this requirement, we 1. maintained a simple unified interface between tasks and methods and 2. made the implementation of the tasks and the methods independent of each other.

These design decisions come at the cost of code repetition between the tasks and the methods. However, we believe that the benefit of being able to add/modify task/methods without affecting the remainder and being able understand a task/method by reading the code linearly outweighs the cost of copying and pasting 😊.

The Split

On the task side, we have: * Dataset: adapts a (third-party) dataset to the interface. * EnvRunner: executes a Policy that accepts the interface and produce logs and metrics. * config/task/<task_name>.yaml: contains all information needed to construct Dataset and EnvRunner. * (optional) Env: an gym==0.21.0 compatible class that encapsulates the task environment.

On the policy side, we have: * Policy: implements inference according to the interface and part of the training process. * Workspace: manages the life-cycle of training and evaluation (interleaved) of a method. * config/<workspace_name>.yaml: contains all information needed to construct Policy and Workspace.

The Interface

Low Dim

A LowdimPolicy takes observation dictionary: - "obs": Tensor of shape (B,To,Do)

and predicts action dictionary: - "action": Tensor of shape (B,Ta,Da)

A [LowdimDataset](./diffusion_policy/dataset/base_datase

Core symbols most depended-on inside this repo

items

called by 140

diffusion_policy/common/replay_buffer.py

called by 92

diffusion_policy/policy/robomimic_image_policy.py

dict_apply

called by 71

diffusion_policy/common/pytorch_util.py

get

called by 49

diffusion_policy/real_world/multi_realsense.py

log

called by 45

diffusion_policy/common/json_logger.py

step

called by 42

diffusion_policy/env/pusht/pusht_env.py

seed

called by 40

diffusion_policy/env/pusht/pusht_env.py

load

called by 38

diffusion_policy/shared_memory/shared_memory_util.py

Shape

Method 1,056

Function 285

Class 205

Route 14

Languages

Python100%

Modules by API surface

diffusion_policy/codecs/imagecodecs_numcodecs.py186 symbols

diffusion_policy/env/block_pushing/block_pushing.py60 symbols

diffusion_policy/model/common/tensor_util.py44 symbols

diffusion_policy/env/block_pushing/block_pushing_multimodal.py44 symbols

diffusion_policy/common/replay_buffer.py44 symbols

diffusion_policy/env/block_pushing/utils/utils_pybullet.py27 symbols

diffusion_policy/model/common/normalizer.py24 symbols

diffusion_policy/env/kitchen/relay_policy_learning/adept_envs/adept_envs/simulation/renderer.py23 symbols

diffusion_policy/real_world/single_realsense.py22 symbols

diffusion_policy/env/pusht/pusht_env.py22 symbols

diffusion_policy/real_world/multi_realsense.py21 symbols

diffusion_policy/gym_util/async_vector_env.py21 symbols

For agents

$ claude mcp add diffusion_policy \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact