MCPcopy
hub / github.com/huggingface/pytorch-image-models

github.com/huggingface/pytorch-image-models @v1.0.27 sqlite

repository ↗ · DeepWiki ↗ · release v1.0.27 ↗
6,459 symbols 20,074 edges 306 files 2,597 documented · 40%
README

PyTorch Image Models

What's New

May 8, 2026

  • Release 1.0.27

April 23, 2026

  • Add Gemma4 ViT encoders w/ NaFlex pipeline support (variable aspect/size per image). Thanks Yonghye Kwon
  • Support DINOv3 weights in NaFlexVit. Thanks Yonghye Kwon
  • Some improvements to Muon fallback (AdamW/NadamW) lr behavior

March 23, 2026

  • Improve pickle checkpoint handling security. Default all loading to weights_only=True, add safe_global for ArgParse.
  • Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass is_causal through for SSL tasks.
  • Fix class & register token uses with ViT and no pos embed enabled.
  • Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr).
  • Improve consistency of output projection / MLP dimensions for attention pooling layers.
  • Hiera model F.SDPA optimization to allow Flash Attention kernel use.
  • Caution added to SGDP optimizer.
  • Release 1.0.26. First maintenance release since my departure from Hugging Face.

Feb 23, 2026

  • Add token distillation training support to distillation task wrappers
  • Remove some torch.jit usage in prep for official deprecation
  • Caution added to AdamP optimizer
  • Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
  • Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
  • Release 1.0.25

Jan 21, 2026

  • Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
  • Does not impact any trained timm models but could impact downstream use.

Jan 5 & 6, 2026

  • Release 1.0.24
  • Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
  • Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
  • Release 1.0.23

Dec 30, 2025

  • Add better NAdaMuon trained dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
  • https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)
  • https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1)
  • https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)
  • Add a ~21M param timm variant of the CSATv2 model at 512x512 & 640x640
  • https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
  • https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
  • Factor non-persistent param init out of __init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.

Dec 12, 2025

  • Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
  • Add AdaMuon and NAdaMuon optimizer support to existing timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.
  • End of year PR cleanup, merge aspects of several long open PR
  • Merge differential attention (DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
    • https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k
    • https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k
  • Add a few pooling modules, LsePlus and SimPool
  • Cleanup, optimize DropBlock2d (also add support to ByobNet based models)
  • Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10

Dec 1, 2025

  • Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
  • Remove old APEX AMP support

Nov 4, 2025

  • Fix LayerScale / LayerScale2d init bug (init values ignored), introduced in 1.0.21. Thanks https://github.com/Ilya-Fradlin
  • Release 1.0.22

Oct 31, 2025 🎃

  • Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions
  • EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs)

Oct 16-20, 2025

  • Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
  • extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
  • small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
  • by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
  • like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
  • select from several NS coefficient presets or specify your own via ns_coefficients
  • First 2 steps of 'meta' device model initialization supported
  • Fix several ops that were breaking creation under 'meta' device context
  • Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
  • License fields added to pretrained cfgs in code
  • Release 1.0.21

Sept 21, 2025

  • Remap DINOv3 ViT weight tags from lvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)
  • Release 1.0.20

Sept 17, 2025

  • DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing timm model. ViT support done via the EVA base model w/ a new RotaryEmbeddingDinoV3 to match the DINOv3 specific RoPE impl
  • HuggingFace Hub: https://huggingface.co/collections/timm/timm-dinov3-68cb08bb0bee365973d52a4d
  • MobileCLIP-2 (https://arxiv.org/abs/2508.20691) vision encoders. New MCI3/MCI4 FastViT variants added and weights mapped to existing FastViT and B, L/14 ViTs.
  • MetaCLIP-2 Worldwide (https://arxiv.org/abs/2507.22062) ViT encoder weights added.
  • SigLIP-2 (https://arxiv.org/abs/2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.
  • Misc fixes and contributions

July 23, 2025

  • Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
  • Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
  • Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

  • ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
  • More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
  • PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
  • Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

July 7, 2025

  • MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
  • Add stem bias (zero'd in updated weights, compat break with old weights)
  • GELU -> GELU (tanh approx). A minor change to be closer to JAX
  • Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
  • Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
  • Some typing, argument cleanup for norm, norm+act layers done with above
  • Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
model img_size top1 top5 param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k 224 84.84 97.122 304.4
vit_large_patch16_rope_mixed_224.naver_in1k 224 84.828 97.116 304.2
vit_large_patch16_rope_ape_224.naver_in1k 224 84.65 97.154 304.37
vit_large_patch16_rope_224.naver_in1k 224 84.648 97.122 304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k 224 83.894 96.754 86.59
vit_base_patch16_rope_mixed_224.naver_in1k 224 83.804 96.712 86.44
vit_base_patch16_rope_ape_224.naver_in1k 224 83.782 96.61 86.59
vit_base_patch16_rope_224.naver_in1k 224 83.718 96.672 86.43
vit_small_patch16_rope_224.naver_in1k 224 81.23 95.022 21.98
vit_small_patch16_rope_mixed_224.naver_in1k 224 81.216 95.022 21.99
vit_small_patch16_rope_ape_224.naver_in1k 224 81.004 95.016 22.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k 224 80.986 94.976 22.06
* Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
* Preparing version 1.0.17 release

June 26, 2025

  • MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
  • Version 1.0.16 released

June 23, 2025

  • Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
  • Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
  • Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
Model Top-1 Acc Top-5 Acc Params (M) Eval Seq Len
naflexvit_base_patch16_par_gap.e300_s576_in1k 83.67 96.45 86.63 576
naflexvit_base_patch16_parfac_gap.e300_s576_in1k 83.63 96.41 86.46 576
naflexvit_base_patch16_gap.e300_s576_in1k 83.50 96.46 86.63 576
* Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208
* Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
* Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
* Fix cuda stream bug in prefetch loader

June 5, 2025

  • Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
  • Encapsulated embedding and position encoding in a single module
  • Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
  • Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
  • Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
  • Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
  • Existing vit models in vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_model
  • Some native weights coming soon
  • A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
  • To enable in train.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVit
  • To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
  • python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
  • The training has some extra args features worth noting
  • The --naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training
  • The --naflex-max-seq-len argument sets the target sequence length for validation
  • Adding --model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation
  • The --naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq len

May 28, 2025

Core symbols most depended-on inside this repo

_cfg
called by 302
timm/models/vision_transformer.py
to
called by 273
timm/task/task.py
get
called by 238
timm/models/_features.py
_cfg
called by 153
timm/models/efficientnet.py
_create_vision_transformer
called by 153
timm/models/vision_transformer.py
feature_take_indices
called by 135
timm/models/_features.py
trunc_normal_
called by 96
timm/layers/weight_init.py
build_model_with_cfg
called by 94
timm/models/_builder.py

Shape

Method 3,171
Function 2,395
Class 884
Route 9

Languages

Python100%

Modules by API surface

timm/models/vision_transformer.py210 symbols
timm/models/efficientnet.py156 symbols
timm/models/maxxvit.py142 symbols
timm/models/byobnet.py129 symbols
timm/models/resnet.py113 symbols
timm/models/levit.py92 symbols
timm/models/eva.py92 symbols
timm/models/gemma4_vit.py74 symbols
timm/models/efficientvit_mit.py73 symbols
timm/models/regnet.py72 symbols
timm/models/metaformer.py72 symbols
timm/models/fastvit.py72 symbols

Dependencies from manifests, versioned

huggingface_hub0.17.0 · 1×
safetensors0.2 · 1×
torch1.7 · 1×

For agents

$ claude mcp add pytorch-image-models \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact