hub / github.com/mudler/LocalAI

github.com/mudler/LocalAI @v4.5.6 sqlite

repository ↗ · DeepWiki ↗ · release v4.5.6 ↗

17,413 symbols 59,790 edges 1,626 files 4,554 documented · 26%

README

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

A small core, not a bundle. Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.

Composable by design: backends are separate and pulled on demand, so you install only what your model needs
Open and extensible: load any model, or build your own backend in any language against an open interface
Drop-in API compatibility: OpenAI, Anthropic, and ElevenLabs APIs across every backend
Any model, any modality: LLMs, vision, voice, image, and video behind one API
Any hardware: NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
Multi-user ready: API key auth, user quotas, role-based access
Built-in AI agents: autonomous agents with tool use, RAG, MCP, and skills
Privacy-first: your data never leaves your infrastructure

A small LocalAI core with backends (llama.cpp, vLLM, MLX, whisper.cpp, stable-diffusion, kokoro, parakeet.cpp...) plugged in as separate on-demand images

Created by Ettore Di Giacinto and maintained by the LocalAI team.

:book: Documentation | :speech_balloon: Discord | 💻 Quickstart | 🖼️ Models | ❓FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

User and auth

https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c

Agents

https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a

Usage metrics per user

https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f

Fine-tuning and Quantization

https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee

WebRTC

https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b

Quickstart

macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

To test a running LocalAI server from the terminal, open an interactive chat session from another shell. Inside the prompt, /models lists installed models and /model <name> switches between them.

# Terminal 1
local-ai run llama-3.2-1b-instruct:q4_k_m

# Terminal 2
local-ai chat --model llama-3.2-1b-instruct:q4_k_m

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

June 2026: New native biometric backends from the LocalAI team: voice-detect.cpp for speaker recognition and voice analysis (ECAPA-TDNN, WeSpeaker, ERes2Net, CAM++, wav2vec2 age/gender/emotion) and face-detect.cpp for face detection, recognition, demographics and anti-spoofing (SCRFD/ArcFace, YuNet/SFace). Both are from-scratch C++/ggml engines with no Python or onnxruntime at inference, self-contained GGUF weights, bit-exact parity with the reference, and GPU cuDNN parity, replacing the heavier Python insightface and speaker-recognition backends (PR #10441).
June 2026: New realtime voice assistant demo (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus streaming of the realtime LLM / TTS / transcription pipeline stages and configurable WebRTC ICE candidates.
June 2026: Big speech push: the parakeet.cpp ASR engine gains NeMo-faithful segment timestamps, a multilingual streaming Nemotron-3.5 model, dynamic batching for concurrent transcription and CUDA graphs; the new CrispASR backend adds multi-architecture ASR + TTS, and 60 Piper TTS voices across 42 languages land in the gallery (plus per-request TTS instructions and params).
June 2026: New backends and models: locate-anything.cpp for open-vocabulary object detection via ggml, Ideogram4 image generation in stablediffusion-ggml, llama.cpp video input, and the Gemma 4 QAT family with MTP speculative-decoding pairs. Plus an interactive CLI chat mode and RAG source citations in agent responses.
June 2026: Distributed mode hardening: prefix-cache-aware routing, a production-ready request router with auto-sized embedding/rerank batches, ds4 layer-split distributed inference, NATS JWT auth + TLS/mTLS, and resumable file uploads.
May 2026: LocalAI 4.3.0 - llama.cpp prompt cache on by default (repeated system prompts collapse from minutes to seconds), keyless cosign signing of backend OCI images, per-API-key + per-user usage attribution, Distributed v3 with per-request replica routing. Release notes
May 2026: LocalAI 4.2.0 - LocalAI sees and hears: voice recognition, face recognition + antispoofing liveness, speaker diarization. Plus drop-in Ollama API, video generation, redesigned UI with i18n + admin-configurable branding, vLLM at feature parity with llama.cpp, and 11 new backends. Release notes
April 2026: LocalAI 4.1.0 - LocalAI becomes a control tower: distributed cluster mode with VRAM-aware smart routing + autoscaling, multi-user platform with OIDC and API keys, per-user quotas with predictive analytics, in-UI fine-tuning with TRL (auto-export to GGUF), on-the-fly quantization backend, visual pipeline editor. Release notes
March 2026: LocalAI 4.0.0 - native agentic orchestration with the new Agenthub community hub, full React UI rewrite with Canvas mode, MCP Apps + client-side with tool streaming, WebRTC realtime audio, MLX-distributed. Release notes
February 2026: Realtime API for audio-to-audio with tool calling, ACE-Step 1.5 support
January 2026: LocalAI 3.10.0 — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. Release notes
December 2025: Dynamic Memory Resource reclaimer, Automatic multi-GPU model fitting (llama.cpp), Vibevoice backend
November 2025: Import models via URL, Multiple chats and history
October 2025: Model Context Protocol (MCP) support for agentic capabilities
September 2025: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
August 2025: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
July 2025: All backends migrated outside the main binary — lightweight, modular architecture

For older news and full release notes, see GitHub Releases and the News page.

Features

Text generation (llama.cpp, transformers, vllm ... and more)
Text to Audio
Audio to Text
Image generation
OpenAI-compatible tools API

Extension points exported contracts — how you extend this code

State (Interface)

State is the sealed sum type of turn-detection states. The only implementations are the marker-method structs in this fi [17 …

core/http/endpoints/openai/turncoord/turncoord.go

Scorer (Interface)

Scorer evaluates a model's joint log-probability of each candidate continuation given a shared prompt. Implemented by Ne [13 …

core/backend/score.go

Publisher (Interface)

Publisher publishes JSON-encoded messages to NATS subjects. [10 implementers]

core/services/messaging/interfaces.go

LocalAIRequest (Interface)

This file and type represent a generic request to LocalAI - as opposed to requests to LocalAI-specific endpoints, which [9 …

core/schema/request.go

ForwardClient (Interface)

ForwardClient is the duplex interface returned by (*Client).Forward. First Send carries path/method/headers/body, subseq [6 …

pkg/grpc/client.go

Parser (Interface)

Parser is the interface all parser types implement. [46 implementers]

pkg/functions/peg/parser.go

Importer (Interface)

(no doc) [35 implementers]

core/gallery/importers/importers.go

LocalAIClient (Interface)

LocalAIClient is the surface tools depend on. It has two implementations: - inproc.Client (in-process; calls LocalAI se [4 …

pkg/mcp/localaitools/client.go

Core symbols most depended-on inside this repo

called by 4294

core/http/static/assets/htmx.js

JSON

called by 878

pkg/functions/peg/builder.go

called by 793

core/http/static/assets/htmx.js

Unlock

called by 735

pkg/grpc/interface.go

push

called by 699

core/http/static/assets/tailwindcss.js

pkg/grpc/interface.go

map

called by 624

core/http/static/assets/tailwindcss.js

Shape

Method 8,508

Function 5,700

Class 1,530

Struct 1,445

Interface 102

TypeAlias 88

FuncType 39

Route 1

Languages

TypeScript47%

Go47%

Python6%

Modules by API surface

core/http/static/assets/pdf.worker.min.js3,500 symbols

core/http/static/assets/pdf.min.js1,372 symbols

core/http/static/assets/tailwindcss.js752 symbols

core/http/static/assets/tw-elements.js496 symbols

core/http/static/assets/codemirror.min.js335 symbols

core/http/static/assets/alpine.js244 symbols

core/http/static/assets/htmx.js189 symbols

pkg/grpc/embed.go139 symbols

core/http/endpoints/openai/types/server_events.go137 symbols

core/http/endpoints/openai/types/types.go126 symbols

core/config/application_config.go103 symbols

core/http/static/assets/marked.js102 symbols

Dependencies from manifests, versioned

dario.cat/mergov1.0.2 · 1×

filippo.io/bigmodv0.1.1-0.20260103110 · 1×

filippo.io/keygenv0.0.0-2026011415190 · 1×

fyne.io/fyne/v2v2.7.3 · 1×

fyne.io/systrayv1.12.0 · 1×

github.com/Azure/go-ansitermv0.0.0-2025010203350 · 1×

github.com/BurntSushi/tomlv1.5.0 · 1×

github.com/JohannesKaufmann/domv0.2.0 · 1×

github.com/JohannesKaufmann/html-to-markdown/v2v2.4.0 · 1×

github.com/KyleBanks/depthv1.2.1 · 1×

github.com/Masterminds/goutilsv1.1.1 · 1×

github.com/Masterminds/semver/v3v3.4.0 · 1×

Datastores touched

dbnameDatabase · 1 repos

localaiDatabase · 1 repos

localrecallDatabase · 1 repos

dbDatabase · 1 repos

mydbDatabase · 1 repos

For agents

$ claude mcp add LocalAI \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact