MCPcopy
hub / github.com/arcee-ai/mergekit

github.com/arcee-ai/mergekit @v0.1.4 sqlite

repository ↗ · DeepWiki ↗ · release v0.1.4 ↗
781 symbols 3,222 edges 91 files 126 documented · 16%
README

mergekit

mergekit is a toolkit for merging pre-trained language models. mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

Contents

Why Merge Models?

Model merging is a powerful technique that allows combining the strengths of different models without the computational overhead of ensembling or the need for additional training. By operating directly in the weight space of models, merging can:

  • Combine multiple specialized models into a single versatile model
  • Transfer capabilities between models without access to training data
  • Find optimal trade-offs between different model behaviors
  • Improve performance while maintaining inference costs
  • Create new capabilities through creative model combinations

Unlike traditional ensembling which requires running multiple models, merged models maintain the same inference cost as a single model while often achieving comparable or superior performance.

Features

Key features of mergekit include:

Installation

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit

pip install -e .  # install the package and make scripts available

If the above fails with the error of:

ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:
(A "pyproject.toml" file was found, but editable mode currently requires a setuptools-based build.)

You may need to upgrade pip to > 21.3 with the command python3 -m pip install --upgrade pip

Contributing

We welcome contributions to mergekit! If you have ideas for new merge methods, features, or other improvements, please check out our contributing guide for details on how to get started.

Usage

The script mergekit-yaml is the main entry point for mergekit. It takes a YAML configuration file and an output path, like so:

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

This will run the merge and write your merged model to ./output-model-directory.

For more information on the arguments accepted by mergekit-yaml run the command mergekit-yaml --help.

Uploading to Huggingface

When you have a merged model you're happy with, you may want to share it on the Hugging Face Hub. mergekit generates a README.md for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely; or use the generated README.md as-is. It is also possible to edit your README.md online once it has been uploaded to the Hub.

Once you're happy with your model card and merged model, you can upload it to the Hugging Face Hub using the huggingface_hub Python library.

# log in to huggingface with an access token (must have write permission)
huggingface-cli login
# upload your model
huggingface-cli upload your_hf_username/my-cool-model ./output-model-directory .

The documentation for huggingface_hub goes into more detail about other options for uploading.

Merge Configuration

Merge configurations are YAML documents specifying the operations to perform in order to produce your merged model. Below are the primary elements of a configuration file:

  • merge_method: Specifies the method to use for merging models. See Merge Methods for a list.
  • slices: Defines slices of layers from different models to be used. This field is mutually exclusive with models.
  • models: Defines entire models to be used for merging. This field is mutually exclusive with slices.
  • base_model: Specifies the base model used in some merging methods.
  • parameters: Holds various parameters such as weights and densities, which can also be specified at different levels of the configuration.
  • dtype: Specifies the data type used for the merging operation.
  • tokenizer or tokenizer_source: Determines how to construct a tokenizer for the merged model.
  • chat_template: Specifies a chat template for the merged model.

Parameter Specification

Parameters are flexible and can be set with varying precedence. They can be specified conditionally using tensor name filters, which allows finer control such as differentiating between attention heads and fully connected layers.

Parameters can be specified as:

  • Scalars: Single floating-point values.
  • Gradients: List of floating-point values, specifying an interpolated gradient.

The parameters can be set at different levels, with decreasing precedence as follows:

  1. slices.*.sources.parameters - applying to a specific input slice
  2. slices.*.parameters - applying to a specific output slice
  3. models.*.parameters or input_model_parameters - applying to any tensors coming from specific input models
  4. parameters - catchall

Tokenizer Configuration

The tokenizer behavior can be configured in two ways: using the new tokenizer field (recommended) or the legacy tokenizer_source field (maintained for backward compatibility). These fields are mutually exclusive - you should use one or the other, not both.

Modern Configuration (tokenizer)

The tokenizer field provides fine-grained control over vocabulary and embeddings:

tokenizer:
  source: "union"  # or "base" or a specific model path
  tokens:          # Optional: configure specific tokens
    <token_name>:
      source: ...  # Specify embedding source
      force: false # Optional: force this embedding for all models
  pad_to_multiple_of: null  # Optional: pad vocabulary size
Tokenizer Source

The source field determines the vocabulary of the output model:

  • union: Combine vocabularies from all input models (default)
  • base: Use vocabulary from the base model
  • "path/to/model": Use vocabulary from a specific model
Token Embedding Handling

When merging models with different vocabularies, mergekit uses smart defaults to handle token embeddings:

  • If a token exists in the base model, its embedding is used as the default
  • If only one model has the token, that model's embedding is used
  • Otherwise, an average of all available embeddings is used

You can override these defaults for specific tokens:

tokenizer:
  source: union
  tokens:
    # Use embedding from a specific model
    <|im_start|>:
      source: "path/to/chatml/model"

    # Force a specific embedding for all models
    <|special|>:
      source: "path/to/model"
      force: true

    # Map a token to another model's token embedding
    <|renamed_token|>:
      source:
        kind: "model_token"
        model: "path/to/model"
        token: "<|original_token|>"  # or use token_id: 1234
Practical Example

Here's how you might preserve both Llama 3 Instruct and ChatML prompt formats when merging models:

tokenizer:
  source: union
  tokens:
    # ChatML tokens
    <|im_start|>:
      source: "chatml_model"
    <|im_end|>:
      source: "chatml_model"

    # Llama 3 tokens - force original embeddings
    <|start_header_id|>:
      source: "llama3_model"
      force: true
    <|end_header_id|>:
      source: "llama3_model"
      force: true
    <|eot_id|>:
      source: "llama3_model"
      force: true

Legacy Configuration (tokenizer_source)

For backward compatibility, the tokenizer_source field is still supported:

tokenizer_source: "union"  # or "base" or a model path

This provides basic tokenizer selection but lacks the fine-grained control of the modern tokenizer field.

Chat Template Configuration

The optional chat_template field allows overriding the chat template used for the merged model.

chat_template: "auto"  # or a template name or Jinja2 template

Options include:

  • "auto": Automatically select the most common template among input models
  • Built-in templates: "alpaca", "chatml", "llama3", "mistral", "exaone"
  • A Jinja2 template string for custom formatting

Examples

Several examples of merge configurations are available in examples/.

Merge Methods

mergekit offers many methods for merging models, each with its own strengths and weaknesses. Choosing the right method depends on your specific goals, the relationship between the models you're merging, and the desired characteristics of the final model.

For detailed explanations, parameter descriptions, and use cases for each method, please see our Merge Method Guide.

Method Overview

Method (value) Core Idea # Models Base Model Key Strengths / Use Cases
Linear (linear) Simple weighted average of model parameters. ≥2 - Averaging similar checkpoints, model soups.
SLERP (slerp) Spherical linear interpolation between two models. 2 Smoothly transitioning between two models.
NuSLERP (nuslerp) Enhanced SLERP with flexible weighting. 2 * More intuitive SLERP; task vector SLERP.
Multi-SLERP (multislerp) Barycentric SLERP for multiple models. ≥2 * Spherical interpolation for >2 models.
Karcher Mean (karcher) Riemannian barycenter of model parameters. ≥2 - Geometrically sound averaging on manifolds.
Task Arithmetic (task_arithmetic) Linearly combine "task vectors" (differences from a base). ≥2 Transferring/combining fine-tuned skills.
TIES (ties) Task arithmetic + sparsification & sign consensus. ≥2 Merging many models, reducing interference.
DARE (dare_linear, dare_ties) Task arithmetic + random pruning & rescaling. ≥2 Robust skill retention, similar to TIES.
DELLA (della, della_linear) Task arithmetic + adaptive magnitude-based pruning. ≥2 Prioritizing important changes, reducing interference.
Model Breadcrumbs (breadcrumbs, breadcrumbs_ties) Task arithmetic + outlier removal (small & large diffs). ≥2 Refining task vectors by removing extreme changes.
SCE (sce)

Core symbols most depended-on inside this repo

items
called by 44
mergekit/common.py
run_and_check_merge
called by 40
tests/common.py
get
called by 39
mergekit/io/tasks.py
values
called by 27
mergekit/common.py
from_pretrained
called by 27
mergekit/common.py
config
called by 26
mergekit/common.py
keys
called by 25
mergekit/io/loader.py
save_tensor
called by 18
mergekit/io/tensor_writer.py

Shape

Method 435
Function 199
Class 141
Route 6

Languages

Python100%

Modules by API surface

tests/test_basic_merges.py39 symbols
mergekit/io/tasks.py38 symbols
mergekit/common.py34 symbols
mergekit/graph.py29 symbols
tests/test_graph.py28 symbols
mergekit/scripts/extract_lora.py26 symbols
mergekit/evo/strategy.py22 symbols
tests/test_tokenizer.py21 symbols
mergekit/config.py21 symbols
mergekit/architecture/base.py21 symbols
mergekit/scripts/tokensurgeon.py17 symbols
mergekit/scripts/fill_missing_params.py16 symbols

Dependencies from manifests, versioned

accelerate1.6.0 · 1×
click8.2.1 · 1×
huggingface_hub
immutables0.21 · 1×
peft
protobuf
pydantic2.10.6 · 1×
safetensors0.5.2 · 1×
sentencepiece
tokenizers0.20.1 · 1×

For agents

$ claude mcp add mergekit \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact