hub / github.com/huggingface/pytorch-image-models

github.com/huggingface/pytorch-image-models @v1.0.27 sqlite

repository ↗ · DeepWiki ↗ · release v1.0.27 ↗

6,459 symbols 20,074 edges 306 files 2,597 documented · 40%

README

PyTorch Image Models

What's New
Introduction
Models
Features
Results
Getting Started (Documentation)
Train, Validation, Inference Scripts
Awesome PyTorch Resources
Licenses
Citing

What's New

May 8, 2026

Release 1.0.27

April 23, 2026

Add Gemma4 ViT encoders w/ NaFlex pipeline support (variable aspect/size per image). Thanks Yonghye Kwon
Support DINOv3 weights in NaFlexVit. Thanks Yonghye Kwon
Some improvements to Muon fallback (AdamW/NadamW) lr behavior

March 23, 2026

Improve pickle checkpoint handling security. Default all loading to weights_only=True, add safe_global for ArgParse.
Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass is_causal through for SSL tasks.
Fix class & register token uses with ViT and no pos embed enabled.
Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr).
Improve consistency of output projection / MLP dimensions for attention pooling layers.
Hiera model F.SDPA optimization to allow Flash Attention kernel use.
Caution added to SGDP optimizer.
Release 1.0.26. First maintenance release since my departure from Hugging Face.

Feb 23, 2026

Add token distillation training support to distillation task wrappers
Remove some torch.jit usage in prep for official deprecation
Caution added to AdamP optimizer
Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
Release 1.0.25

Jan 21, 2026

Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
Does not impact any trained timm models but could impact downstream use.

Jan 5 & 6, 2026

Release 1.0.24
Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
Release 1.0.23

Dec 30, 2025

Add better NAdaMuon trained dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)
https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1)
https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)
Add a ~21M param timm variant of the CSATv2 model at 512x512 & 640x640
https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
Factor non-persistent param init out of __init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.

Dec 12, 2025

Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
Add AdaMuon and NAdaMuon optimizer support to existing timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.
End of year PR cleanup, merge aspects of several long open PR
Merge differential attention (DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
- https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k
- https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k
Add a few pooling modules, LsePlus and SimPool
Cleanup, optimize DropBlock2d (also add support to ByobNet based models)
Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10

Dec 1, 2025

Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
Remove old APEX AMP support

Nov 4, 2025

Fix LayerScale / LayerScale2d init bug (init values ignored), introduced in 1.0.21. Thanks https://github.com/Ilya-Fradlin
Release 1.0.22

Oct 31, 2025 🎃

Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions
EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs)

Oct 16-20, 2025

Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
select from several NS coefficient presets or specify your own via ns_coefficients
First 2 steps of 'meta' device model initialization supported
Fix several ops that were breaking creation under 'meta' device context
Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
License fields added to pretrained cfgs in code
Release 1.0.21

Sept 21, 2025

Remap DINOv3 ViT weight tags from lvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)
Release 1.0.20

Sept 17, 2025

DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing timm model. ViT support done via the EVA base model w/ a new RotaryEmbeddingDinoV3 to match the DINOv3 specific RoPE impl
HuggingFace Hub: https://huggingface.co/collections/timm/timm-dinov3-68cb08bb0bee365973d52a4d
MobileCLIP-2 (https://arxiv.org/abs/2508.20691) vision encoders. New MCI3/MCI4 FastViT variants added and weights mapped to existing FastViT and B, L/14 ViTs.
MetaCLIP-2 Worldwide (https://arxiv.org/abs/2507.22062) ViT encoder weights added.
SigLIP-2 (https://arxiv.org/abs/2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.
Misc fixes and contributions

July 23, 2025

Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

July 7, 2025

MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
Add stem bias (zero'd in updated weights, compat break with old weights)
GELU -> GELU (tanh approx). A minor change to be closer to JAX
Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
Some typing, argument cleanup for norm, norm+act layers done with above
Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub

model	img_size	top1	top5	param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k	224	84.84	97.122	304.4
vit_large_patch16_rope_mixed_224.naver_in1k	224	84.828	97.116	304.2
vit_large_patch16_rope_ape_224.naver_in1k	224	84.65	97.154	304.37
vit_large_patch16_rope_224.naver_in1k	224	84.648	97.122	304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k	224	83.894	96.754	86.59
vit_base_patch16_rope_mixed_224.naver_in1k	224	83.804	96.712	86.44
vit_base_patch16_rope_ape_224.naver_in1k	224	83.782	96.61	86.59
vit_base_patch16_rope_224.naver_in1k	224	83.718	96.672	86.43
vit_small_patch16_rope_224.naver_in1k	224	81.23	95.022	21.98
vit_small_patch16_rope_mixed_224.naver_in1k	224	81.216	95.022	21.99
vit_small_patch16_rope_ape_224.naver_in1k	224	81.004	95.016	22.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k	224	80.986	94.976	22.06
* Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
* Preparing version 1.0.17 release

June 26, 2025

MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
Version 1.0.16 released

June 23, 2025

Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.

Model	Top-1 Acc	Top-5 Acc	Params (M)	Eval Seq Len
naflexvit_base_patch16_par_gap.e300_s576_in1k	83.67	96.45	86.63	576
naflexvit_base_patch16_parfac_gap.e300_s576_in1k	83.63	96.41	86.46	576
naflexvit_base_patch16_gap.e300_s576_in1k	83.50	96.46	86.63	576
* Support gradient checkpointing for `forward_intermediates` and fix some checkpointing bugs. Thanks https://github.com/brianhou0208
* Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
* Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
* Fix cuda stream bug in prefetch loader

June 5, 2025

Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
Encapsulated embedding and position encoding in a single module
Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
Existing vit models in vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_model
Some native weights coming soon
A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
To enable in train.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVit
To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
The training has some extra args features worth noting
The --naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training
The --naflex-max-seq-len argument sets the target sequence length for validation
Adding --model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation
The --naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq len

May 28, 2025

Add a number of small/fast models thanks to https://github.com/brianhou0208
SwiftFormer - (ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
FasterNet - (CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
SHViT - (CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient
StarNet - (CVPR2024) Rewrite the Stars
GhostNet-V3 GhostNetV3: Exploring the Training Strategies for Compact Models
Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicated timm weights
Add some flexibility to ROPE impl
Big increase in number of models supporting forward_intermediates() and some additional fixes thanks to https://github.com/brianh

Core symbols most depended-on inside this repo

_cfg

called by 302

timm/models/vision_transformer.py

timm/models/_features.py

_cfg

called by 153

timm/models/efficientnet.py

_create_vision_transformer

called by 153

timm/models/vision_transformer.py

feature_take_indices

called by 135

timm/models/_features.py

trunc_normal_

called by 96

timm/layers/weight_init.py

build_model_with_cfg

called by 94

timm/models/_builder.py

Shape

Method 3,171

Function 2,395

Class 884

Route 9

Languages

Python100%

Modules by API surface

timm/models/vision_transformer.py210 symbols

timm/models/efficientnet.py156 symbols

timm/models/maxxvit.py142 symbols

timm/models/byobnet.py129 symbols

timm/models/resnet.py113 symbols

timm/models/levit.py92 symbols

timm/models/eva.py92 symbols

timm/models/gemma4_vit.py74 symbols

timm/models/efficientvit_mit.py73 symbols

timm/models/regnet.py72 symbols

timm/models/metaformer.py72 symbols

timm/models/fastvit.py72 symbols

Dependencies from manifests, versioned

huggingface_hub0.17.0 · 1×

safetensors0.2 · 1×

torch1.7 · 1×

For agents

$ claude mcp add pytorch-image-models \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact