MCPcopy
hub / github.com/MiniMax-AI/MiniMax-01

github.com/MiniMax-AI/MiniMax-01 @main sqlite

repository ↗ · DeepWiki ↗
12 symbols 49 edges 3 files 0 documented · 0%
README
  <img src="https://github.com/MiniMax-AI/MiniMax-01/raw/main/figures/MiniMaxLogo-Light.png" width="60%" alt="MiniMax">

Homepage Paper Chat API MCP

Hugging Face WeChat

Model License Code License

MiniMax-01

1. Introduction

We are delighted to introduce two remarkable models, MiniMax-Text-01 and MiniMax-VL-01. MiniMax-Text-01 is a powerful language model boasting 456 billion total parameters, with 45.9 billion activated per token. To unlock its long-context capabilities, it adopts a hybrid architecture integrating Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies like Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, and Expert Tensor Parallel (ETP), its training context length extends to 1 million tokens, and it can handle up to 4 million tokens during inference. Consequently, MiniMax-Text-01 showcases top-tier performance on various academic benchmarks. Building on MiniMax-Text-01's prowess, we developed MiniMax-VL-01 for enhanced visual capabilities. It uses the "ViT-MLP-LLM" framework common in multimodal LLMs. It is initialized and trained using three key components: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and MiniMax-Text-01 as the base LLM. This model features a dynamic resolution mechanism. Input images are resized according to a pre-set grid, with resolutions ranging from 336×336 to 2016×2016, while maintaining a 336×336 thumbnail. The resized images are split into non - overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined to form a full image representation. As a result, MiniMax-VL-01 has achieved top-level performance on multimodal leaderboards, demonstrating its edge in complex multimodal tasks.

2. Model Architecture

The architecture of MiniMax-Text-01 is briefly described as follows: - Total Parameters: 456B - Activated Parameters per Token: 45.9B - Number Layers: 80 - Hybrid Attention: a softmax attention is positioned after every 7 lightning attention. - Number of attention heads: 64 - Attention head dimension: 128 - Mixture of Experts: - Number of experts: 32 - Expert hidden dimension: 9216 - Top-2 routing strategy - Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000 - Hidden Size: 6144 - Vocab Size: 200,064

For MiniMax-VL-01, the additional ViT architecture details is as follows: - Total Parameters: 303M - Number of layers: 24 - Patch size: 14 - Hidden size: 1024 - FFN hidden size: 4096 - Number of heads: 16 - Attention head dimension: 64

3. Evaluation

Text Benchmarks

Core Academic Benchmarks

Tasks GPT-4o (11-20) Claude-3.5-Sonnet (10-22) Gemini-1.5-Pro (002) Gemini-2.0-Flash (exp) Qwen2.5-72B-Inst. DeepSeek-V3 Llama-3.1-405B-Inst. MiniMax-Text-01
General
MMLU* 85.7 88.3 86.8 86.5 86.1 88.5 88.6 88.5
MMLU-Pro* 74.4 78.0 75.8 76.4 71.1 75.9 73.3 75.7
SimpleQA 39.0 28.1 23.4 26.6 10.3 24.9 23.2 23.7
C-SimpleQA 64.6 56.8 59.4 63.3 52.2 64.8 54.7 67.4
IFEval (avg) 84.1 90.1 89.4 88.4 87.2 87.3 86.4 89.1
Arena-Hard 92.4 87.6 85.3 72.7 81.2 91.4 63.5 89.1
Reasoning
GPQA* (diamond) 46.0 65.0 59.1 62.1 49.0 59.1 50.7 54.4
DROP* (F1) 89.2 88.8 89.2 89.3 85.0 91.0 92.5 87.8
Mathematics
GSM8k* 95.6 96.9 95.2 95.4 95.8 96.7 96.7 94.8
MATH* 76.6 74.1 84.6 83.9 81.8 84.6 73.8 77.4
Coding
MBPP + 76.2 75.1 75.4 75.9 77.0 78.8 73.0 71.7
HumanEval 90.2 93.7 86.6 89.6 86.6 92.1 89.0 86.9

* Evaluated following a 0-shot CoT setting.

Long Benchmarks

4M Needle In A Haystack Test

Ruler | Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | |-------|----|----|-----|-----|-----|------|------|------|----| | GPT-4o (11-20) | 0.970 | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | | Claude-3.5-Sonnet (10-22) | 0.965 | 0.960 | 0.957 | 0.950 | 0.952 | 0.938 | - | - | - | | Gemini-1.5-Pro (002) | 0.962 | 0.960 | 0.960 | 0.958 | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | | Gemini-2.0-Flash (exp) | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | | MiniMax-Text-01 | 0.963 | 0.961 | 0.953 | 0.954 | 0.943 | 0.947 | 0.945 | 0.928 | 0.910 |

LongBench v2 | Model | overall | easy | hard | short | medium | long | |----------------------------|-------------|----------|----------|------------|------------|----------| | Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | | w/ CoT | | | | | | | | GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | | Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | | Deepseek-V3 | - | - | - | - | - | - | | Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 |

Core symbols most depended-on inside this repo

generate_quanto_config
called by 1
inference/minimax-vl-01.py
parse_args
called by 1
inference/minimax-vl-01.py
check_params
called by 1
inference/minimax-vl-01.py
main
called by 1
inference/minimax-vl-01.py
generate_quanto_config
called by 1
inference/minimax-text-01.py
parse_args
called by 1
inference/minimax-text-01.py
check_params
called by 1
inference/minimax-text-01.py
main
called by 1
inference/minimax-text-01.py

Shape

Function 12

Languages

Python100%

Modules by API surface

inference/minimax-vl-01.py5 symbols
inference/minimax-text-01.py4 symbols
evaluation/MR-NIAH/score.py3 symbols

Dependencies from manifests, versioned

accelerate1.2.1 · 1×
optimum-quanto0.2.1 · 1×
quanto0.2.0 · 1×
transformers4.47.1 · 1×

For agents

$ claude mcp add MiniMax-01 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact