hub / github.com/mit-han-lab/once-for-all

github.com/mit-han-lab/once-for-all @v0.1

repository ↗ · DeepWiki ↗ · release v0.1 ↗ · Ask this repo → · + Follow

490 symbols 1,247 edges 41 files 20 documented · 4% ● updated 2y agov0.1 · 2020-06-16★ 1,95155 open issues

README

Once for All: Train One Network and Specialize it for Efficient Deployment [arXiv] [Slides] [Video]

@inproceedings{
  cai2020once,
  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://arxiv.org/pdf/1908.09791.pdf}
}

[News] The hands-on tutorial of OFA is released!

[News] OFA is available via pip! Run pip install ofa to install the whole OFA codebase.

[News] Fisrt place in the 4th Low-Power Computer Vision Challenge, both classification and detection track.

[News] First place in the 3rd Low-Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.

Train once, specialize for many deployment scenarios

80% top1 ImageNet accuracy under mobile setting

Consistently outperforms MobileNetV3 on Diverse hardware platforms

How to use / evaluate OFA Specialized Networks

Use

""" OFA Specialized Networks.
Example: net, image_size = ofa_specialized('flops@595M_top1@80.0_finetune@75', pretrained=True)
""" 
from ofa.model_zoo import ofa_specialized
net, image_size = ofa_specialized(net_id, pretrained=True)

If the above scripts failed to download, you download it manually from Google Drive and put them under $HOME/.torch/ofa_specialized/.

Evaluate

python eval_specialized_net.py --path 'Your path to imagent' --net flops@595M_top1@80.0_finetune@75

OFA based on FLOPs

flops@595M_top1@80.0_finetune@75
flops@482M_top1@79.6_finetune@75
flops@389M_top1@79.1_finetune@75

OFA for Mobile Phones

LG G8 * LG-G8_lat@24ms_top1@76.4_finetune@25 * LG-G8_lat@16ms_top1@74.7_finetune@25 * LG-G8_lat@11ms_top1@73.0_finetune@25 * LG-G8_lat@8ms_top1@71.1_finetune@25	Samsung Note8 * note8_lat@65ms_top1@76.1_finetune@25 * note8_lat@49ms_top1@74.9_finetune@25 * note8_lat@31ms_top1@72.8_finetune@25 * note8_lat@22ms_top1@70.4_finetune@25
Google Pixel1 * pixel1_lat@143ms_top1@80.1_finetune@75 * pixel1_lat@132ms_top1@79.8_finetune@75 * pixel1_lat@79ms_top1@78.7_finetune@75 * pixel1_lat@58ms_top1@76.9_finetune@75 * pixel1_lat@40ms_top1@74.9_finetune@25 * pixel1_lat@28ms_top1@73.3_finetune@25 * pixel1_lat@20ms_top1@71.4_finetune@25	Samsung Note10 * note10_lat@64ms_top1@80.2_finetune@75 * note10_lat@50ms_top1@79.7_finetune@75 * note10_lat@41ms_top1@79.3_finetune@75 * note10_lat@30ms_top1@78.4_finetune@75 * note10_lat@22ms_top1@76.6_finetune@25 * note10_lat@16ms_top1@75.5_finetune@25 * note10_lat@11ms_top1@73.6_finetune@25 * note10_lat@8ms_top1@71.4_finetune@25
Google Pixel2 * pixel2_lat@62ms_top1@75.8_finetune@25 * pixel2_lat@50ms_top1@74.7_finetune@25 * pixel2_lat@35ms_top1@73.4_finetune@25 * pixel2_lat@25ms_top1@71.5_finetune@25	Samsung S7 Edge * s7edge_lat@88ms_top1@76.3_finetune@25 * s7edge_lat@58ms_top1@74.7_finetune@25 * s7edge_lat@41ms_top1@73.1_finetune@25 * s7edge_lat@29ms_top1@70.5_finetune@25

OFA for Desktop (CPUs and GPUs)

1080ti GPU (Batch Size 64) * 1080ti_gpu64@27ms_top1@76.4_finetune@25 * 1080ti_gpu64@22ms_top1@75.3_finetune@25 * 1080ti_gpu64@15ms_top1@73.8_finetune@25 * 1080ti_gpu64@12ms_top1@72.6_finetune@25	V100 GPU (Batch Size 64) * v100_gpu64@11ms_top1@76.1_finetune@25 * v100_gpu64@9ms_top1@75.3_finetune@25 * v100_gpu64@6ms_top1@73.0_finetune@25 * v100_gpu64@5ms_top1@71.6_finetune@25
Jetson TX2 GPU (Batch Size 16) * tx2_gpu16@96ms_top1@75.8_finetune@25 * tx2_gpu16@80ms_top1@75.4_finetune@25 * tx2_gpu16@47ms_top1@72.9_finetune@25 * tx2_gpu16@35ms_top1@70.3_finetune@25	Intel Xeon CPU with MKL-DNN (Batch Size 1) * cpu_lat@17ms_top1@75.7_finetune@25 * cpu_lat@15ms_top1@74.6_finetune@25 * cpu_lat@11ms_top1@72.0_finetune@25 * cpu_lat@10ms_top1@71.1_finetune@25

How to use / evaluate OFA Networks

Use

""" OFA Networks.
    Example: ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
""" 
from ofa.model_zoo import ofa_net
ofa_network = ofa_net(net_id, pretrained=True)

# Randomly sample sub-networks from OFA network
ofa_network.sample_active_subnet()
random_subnet = ofa_network.get_active_subnet(preserve_weight=True)

# Manually set the sub-network
ofa_network.set_active_subnet(ks=7, e=6, d=4)
manual_subnet = ofa_network.get_active_subnet(preserve_weight=True)

If the above scripts failed to download, you download it manually from Google Drive and put them under $HOME/.torch/ofa_nets/.

Evaluate

python eval_ofa_net.py --path 'Your path to imagenet' --net ofa_mbv3_d234_e346_k357_w1.0

How to train OFA Networks

mpirun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    python train_ofa_net.py

horovodrun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    python train_ofa_net.py

Introduction Video

Hands-on Tutorial Video

Requirement

Python 3.6
Pytorch 1.0.0
ImageNet Dataset
Horovod

Related work on automated and efficient deep learning:

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)

Core symbols most depended-on inside this repo

int2list

called by 25

ofa/imagenet_codebase/utils/__init__.py

make_divisible

called by 21

ofa/imagenet_codebase/utils/pytorch_modules.py

ofa/imagenet_codebase/utils/__init__.py

query

called by 13

tutorial/latency_table.py

ofa/imagenet_codebase/utils/__init__.py

write_log

called by 12

ofa/imagenet_codebase/run_manager/run_manager.py

Shape

Method 355

Class 68

Function 67

Languages

Python100%

Modules by API surface

ofa/layers.py54 symbols

ofa/imagenet_codebase/modules/layers.py54 symbols

ofa/utils.py37 symbols

ofa/imagenet_codebase/data_providers/my_data_loader.py28 symbols

ofa/imagenet_codebase/run_manager/run_manager.py27 symbols

ofa/model_zoo.py22 symbols

ofa/elastic_nn/modules/dynamic_layers.py22 symbols

ofa/imagenet_codebase/utils/__init__.py19 symbols

ofa/elastic_nn/modules/dynamic_op.py17 symbols

ofa/imagenet_codebase/utils/my_modules.py16 symbols

ofa/imagenet_codebase/networks/proxyless_nets.py16 symbols

ofa/imagenet_codebase/data_providers/imagenet.py16 symbols

For agents

$ claude mcp add once-for-all \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

LG G8 * LG-G8_lat@24ms_top1@76.4_finetune@25 * LG-G8_lat@16ms_top1@74.7_finetune@25 * LG-G8_lat@11ms_top1@73.0_finetune@25 * LG-G8_lat@8ms_top1@71.1_finetune@25	Samsung Note8 * note8_lat@65ms_top1@76.1_finetune@25 * note8_lat@49ms_top1@74.9_finetune@25 * note8_lat@31ms_top1@72.8_finetune@25 * note8_lat@22ms_top1@70.4_finetune@25
Google Pixel1 * pixel1_lat@143ms_top1@80.1_finetune@75 * pixel1_lat@132ms_top1@79.8_finetune@75 * pixel1_lat@79ms_top1@78.7_finetune@75 * pixel1_lat@58ms_top1@76.9_finetune@75 * pixel1_lat@40ms_top1@74.9_finetune@25 * pixel1_lat@28ms_top1@73.3_finetune@25 * pixel1_lat@20ms_top1@71.4_finetune@25	Samsung Note10 * note10_lat@64ms_top1@80.2_finetune@75 * note10_lat@50ms_top1@79.7_finetune@75 * note10_lat@41ms_top1@79.3_finetune@75 * note10_lat@30ms_top1@78.4_finetune@75 * note10_lat@22ms_top1@76.6_finetune@25 * note10_lat@16ms_top1@75.5_finetune@25 * note10_lat@11ms_top1@73.6_finetune@25 * note10_lat@8ms_top1@71.4_finetune@25
Google Pixel2 * pixel2_lat@62ms_top1@75.8_finetune@25 * pixel2_lat@50ms_top1@74.7_finetune@25 * pixel2_lat@35ms_top1@73.4_finetune@25 * pixel2_lat@25ms_top1@71.5_finetune@25	Samsung S7 Edge * s7edge_lat@88ms_top1@76.3_finetune@25 * s7edge_lat@58ms_top1@74.7_finetune@25 * s7edge_lat@41ms_top1@73.1_finetune@25 * s7edge_lat@29ms_top1@70.5_finetune@25

1080ti GPU (Batch Size 64) * 1080ti_gpu64@27ms_top1@76.4_finetune@25 * 1080ti_gpu64@22ms_top1@75.3_finetune@25 * 1080ti_gpu64@15ms_top1@73.8_finetune@25 * 1080ti_gpu64@12ms_top1@72.6_finetune@25	V100 GPU (Batch Size 64) * v100_gpu64@11ms_top1@76.1_finetune@25 * v100_gpu64@9ms_top1@75.3_finetune@25 * v100_gpu64@6ms_top1@73.0_finetune@25 * v100_gpu64@5ms_top1@71.6_finetune@25
Jetson TX2 GPU (Batch Size 16) * tx2_gpu16@96ms_top1@75.8_finetune@25 * tx2_gpu16@80ms_top1@75.4_finetune@25 * tx2_gpu16@47ms_top1@72.9_finetune@25 * tx2_gpu16@35ms_top1@70.3_finetune@25	Intel Xeon CPU with MKL-DNN (Batch Size 1) * cpu_lat@17ms_top1@75.7_finetune@25 * cpu_lat@15ms_top1@74.6_finetune@25 * cpu_lat@11ms_top1@72.0_finetune@25 * cpu_lat@10ms_top1@71.1_finetune@25