weights_only=True, add safe_global for ArgParse.is_causal through for SSL tasks.ParallelScalingBlock (& DiffParallelScalingBlock)timm models but could impact downstream use.dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runstimm variant of the CSATv2 model at 512x512 & 640x640__init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init. timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vitsLsePlus and SimPoolDropBlock2d (also add support to ByobNet based models)nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)adjust_lr_fnns_coefficientstimmlvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)timm model. ViT support done via the EVA base model w/ a new RotaryEmbeddingDinoV3 to match the DINOv3 specific RoPE implset_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation timeeva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub| model | img_size | top1 | top5 | param_count |
|---|---|---|---|---|
| vit_large_patch16_rope_mixed_ape_224.naver_in1k | 224 | 84.84 | 97.122 | 304.4 |
| vit_large_patch16_rope_mixed_224.naver_in1k | 224 | 84.828 | 97.116 | 304.2 |
| vit_large_patch16_rope_ape_224.naver_in1k | 224 | 84.65 | 97.154 | 304.37 |
| vit_large_patch16_rope_224.naver_in1k | 224 | 84.648 | 97.122 | 304.17 |
| vit_base_patch16_rope_mixed_ape_224.naver_in1k | 224 | 83.894 | 96.754 | 86.59 |
| vit_base_patch16_rope_mixed_224.naver_in1k | 224 | 83.804 | 96.712 | 86.44 |
| vit_base_patch16_rope_ape_224.naver_in1k | 224 | 83.782 | 96.61 | 86.59 |
| vit_base_patch16_rope_224.naver_in1k | 224 | 83.718 | 96.672 | 86.43 |
| vit_small_patch16_rope_224.naver_in1k | 224 | 81.23 | 95.022 | 21.98 |
| vit_small_patch16_rope_mixed_224.naver_in1k | 224 | 81.216 | 95.022 | 21.99 |
| vit_small_patch16_rope_ape_224.naver_in1k | 224 | 81.004 | 95.016 | 22.06 |
| vit_small_patch16_rope_mixed_ape_224.naver_in1k | 224 | 80.986 | 94.976 | 22.06 |
| * Some cleanup of ROPE modules, helpers, and FX tracing leaf registration | ||||
| * Preparing version 1.0.17 release |
| Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |
|---|---|---|---|---|
| naflexvit_base_patch16_par_gap.e300_s576_in1k | 83.67 | 96.45 | 86.63 | 576 |
| naflexvit_base_patch16_parfac_gap.e300_s576_in1k | 83.63 | 96.41 | 86.46 | 576 |
| naflexvit_base_patch16_gap.e300_s576_in1k | 83.50 | 96.46 | 86.63 | 576 |
* Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208 |
||||
| * Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers | ||||
| * Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly | ||||
| * Fix cuda stream bug in prefetch loader |
vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_modeltrain.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVitpython validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256 --naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training--naflex-max-seq-len argument sets the target sequence length for validation--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation--naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq lentimm weightsforward_intermediates() and some additional fixes thanks to https://github.com/brianh$ claude mcp add pytorch-image-models \
-- python -m otcore.mcp_server <graph>