MCPcopy Index your code
hub / github.com/krillinai/KrillinAI

github.com/krillinai/KrillinAI @v2.1.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.1.0 ↗
1,077 symbols 3,304 edges 139 files 264 documented · 25%
README

KrillinAI

Video Translation & Dubbing Tool for Humans / Agents (Skills Included)

KrillinAI%2FKrillinAI | Trendshift

English简体中文日本語한국어Tiếng ViệtFrançaisDeutschEspañolPortuguêsРусскийاللغة العربية

Twitter QQ 群 Bilibili Ask DeepWiki

Project Introduction (v2.0 with Agent support — now released)

Quick Start

KrillinAI is a versatile audio and video localization and enhancement solution developed by the Krillin AI team, designed for both human users and AI Agents. The tool covers the complete pipeline including video download, speech transcription, subtitle translation, TTS dubbing, portrait conversion, and cover generation, supporting both landscape and portrait formats to ensure perfect presentation on all major platforms (Bilibili, Xiaohongshu, Douyin, WeChat Video, Kuaishou, YouTube, TikTok, etc.). Human users can complete end-to-end content localization with one click via the client; each capability can also be invoked independently via CLI, and AI Agents can orchestrate single or multiple stages on demand to flexibly compose automated workflows.

New Features

🤖 CLI Support: Provides a phased command-line interface where each stage executes independently and outputs structured results, supporting cross-stage artifact reuse.

🧩 Skills Collection: The skills/ directory provides per-stage Skills for AI Agents to invoke directly under a stable contract, no need to parse CLI documentation.

🔗 Pipeline Orchestration: Chain multiple stages in one command, enabling full automation from download to rendering.

🖼️ Cover Generation: Automatically generate platform cover images from the original video thumbnail and a prompt template.

Key Features and Functions:

📥 Video Acquisition: Supports yt-dlp downloads or local file uploads

📜 Accurate Recognition: High-accuracy speech recognition based on Whisper

🧠 Intelligent Segmentation: Subtitle segmentation and alignment using LLM

🔄 Terminology Replacement: One-click replacement of professional vocabulary

🌍 Professional Translation: LLM translation with context to maintain natural semantics

🎙️ Voice Cloning: Offers selected voice tones from CosyVoice or custom voice cloning

🎬 Video Composition: Automatically processes landscape and portrait videos and subtitle layout

💻 Cross-Platform: Supports Windows, Linux, macOS, providing desktop, server, and CLI modes

Effect Demonstration

The image below shows the effect of the subtitle file generated after importing a 46-minute local video and executing it with one click, without any manual adjustments. There are no omissions or overlaps, the segmentation is natural, and the translation quality is very high. Alignment Effect

### Subtitle Translation --- https://github.com/user-attachments/assets/bba1ac0a-fe6b-4947-b58d-ba99306d0339 ### Dubbing --- https://github.com/user-attachments/assets/0b32fad3-c3ad-4b6a-abf0-0865f0dd2385 ### Portrait Mode --- https://github.com/user-attachments/assets/c2c7b528-0ef8-4ba9-b8ac-f9f92f6d4e71

🔍 Supported Speech Recognition Services

All local models in the table below support automatic installation of executable files + model files; you just need to choose, and Klic will prepare everything for you.

Service Source Supported Platforms Model Options Local/Cloud Remarks
OpenAI Whisper All Platforms - Cloud Fast speed and good effect
FasterWhisper Windows/Linux tiny/medium/large-v2 (recommended medium+) Local Faster speed, no cloud service cost
WhisperKit macOS (M-series only) large-v2 Local Native optimization for Apple chips
WhisperCpp All Platforms large-v2 Local Supports all platforms
Alibaba Cloud ASR All Platforms - Cloud Avoids network issues in mainland China

🚀 Large Language Model Support

✅ Compatible with all cloud/local large language model services that comply with OpenAI API specifications, including but not limited to:

  • OpenAI
  • Gemini
  • DeepSeek
  • Tongyi Qianwen
  • Locally deployed open-source models
  • Other API services compatible with OpenAI format

🎤 TTS Text-to-Speech Support

  • Alibaba Cloud Voice Service
  • OpenAI TTS

Language Support

Input languages supported: Chinese, English, Japanese, German, Turkish, Korean, Russian, Malay (continuously increasing)

Translation languages supported: English, Chinese, Russian, Spanish, French, and 101 other languages

Interface Preview

Interface Preview Interface Preview

🚀 Quick Start

You can ask questions on the Deepwiki of KrillinAI. It indexes the files in the repository, so you can find answers quickly.

Basic Steps

First, download the executable file that matches your device system from the Release, then follow the tutorial below to choose between the desktop version or non-desktop version. Place the software download in an empty folder, as running it will generate some directories, and keeping it in an empty folder will make management easier.

【If it is the desktop version, i.e., the release file with "desktop," see here】 The desktop version is newly released to address the issues of new users struggling to edit configuration files correctly, and there are some bugs that are continuously being updated.

  1. Double-click the file to start using it (the desktop version also requires configuration within the software)

【If it is the non-desktop version, i.e., the release file without "desktop," see here】 The non-desktop version is the initial version, which has a more complex configuration but is stable in functionality and suitable for server deployment, as it provides a UI in a web format.

  1. Create a config folder within the folder, then create a config.toml file in the config folder. Copy the contents of the config-example.toml file from the source code's config directory into config.toml, and fill in your configuration information according to the comments.
  2. Double-click or execute the executable file in the terminal to start the service
  3. Open your browser and enter http://127.0.0.1:8888 to start using it (replace 8888 with the port you specified in the configuration file)

To: macOS Users

【If it is the desktop version, i.e., the release file with "desktop," see here】 Due to signing issues, the desktop version currently cannot be double-clicked to run or installed via dmg; you need to manually trust the application. The method is as follows:

  1. Open the terminal in the directory where the executable file (assuming the file name is KrillinAI_1.0.0_desktop_macOS_arm64) is located
  2. Execute the following commands in order:
sudo xattr -cr ./KrillinAI_1.0.0_desktop_macOS_arm64
sudo chmod +x ./KrillinAI_1.0.0_desktop_macOS_arm64
./KrillinAI_1.0.0_desktop_macOS_arm64

【If it is the non-desktop version, i.e., the release file without "desktop," see here】 This software is not signed, so when running on macOS, after completing the file configuration in the "Basic Steps," you also need to manually trust the application. The method is as follows:

  1. Open the terminal in the directory where the executable file (assuming the file name is KrillinAI_1.0.0_macOS_arm64) is located
  2. Execute the following commands in order: sudo xattr -rd com.apple.quarantine ./KrillinAI_1.0.0_macOS_arm64 sudo chmod +x ./KrillinAI_1.0.0_macOS_arm64 ./KrillinAI_1.0.0_macOS_arm64

This will start the service

Docker Deployment

This project supports Docker deployment; please refer to the Docker Deployment Instructions

CLI Usage

KrillinAI provides a staged CLI suitable for scripting, automation pipelines, and AI Agent invocation. The CLI executes synchronously by default, outputs a single JSON line to stdout upon completion, and writes krillinai_manifest.json to the working directory for subsequent stages to reuse prior artifacts.

Build from source:

go build -o build/krillinai-cli ./cmd/cli

Command overview:

Command Purpose Typical Outputs
subtitle Generate subtitles from YouTube / Bilibili links or local videos; tries platform captions first, falls back to Whisper transcription origin_language_srt.srt, target_language_srt.srt, bilingual_srt.srt, short_origin_mixed_srt.srt
tts Generate target-language dubbing from target subtitles tts_final_audio.wav, video_with_tts.mp4
render-horizontal Produce horizontal video: original + bilingual subtitles, or dubbed video + target subtitles horizontal_bilingual.mp4
render-vertical Produce vertical video: original converted to vertical + short subtitles, or dubbed video + target subtitles transferred_vertical_video.mp4, vertical_bilingual.mp4
pipeline Orchestrate multiple stages via --outputs Determined by selected stages
cover Generate a cover image from the original cover and prompt templates generated_cover.png

Typical workflow:

# 1. Generate subtitles: original, target, bilingual, and vertical short subtitles
./build/krillinai-cli subtitle "https://www.youtube.com/watch?v=dQw4w9WgXcQ" \
  --origin-lang en \
  --target-lang zh_cn \
  --workdir tasks/demo \
  --caption-source any

# 2. Generate dubbing from target-language subtitles
./build/krillinai-cli tts \
  --workdir tasks/demo \
  --input-srt tasks/demo/target_language_srt.srt \
  --line-mode target-only \
  --video tasks/demo/origin_video.mp4

# 3. Produce horizontal bilingual-subtitle video
./build/krillinai-cli render-horizontal \
  --workdir tasks/demo \
  --video tasks/demo/origin_video.mp4 \
  --subtitle tasks/demo/bilingual_srt.srt

# 4. Produce vertical short-subtitle video
./build/krillinai-cli render-vertical \
  --workdir tasks/demo \
  --video tasks/demo/origin_video.mp4 \
  --subtitle tasks/demo/short_origin_mixed_srt.srt \
  --major-title "今日话题" \
  --minor-title "AI Video"

Agent integration conventions:

  • Parse the last JSON line on stdout and krillinai_manifest.json — do not parse plain-text logs.
  • The outputs field records stage artifact paths; subsequent commands can pass only --workdir to reuse the manifest.
  • Supports --dry-run to validate parameters and generate a manifest without downloading video or calling external AI services.
  • Handle errors by error.kind: usage → fix parameters, retryable → retry, dependency → install ffmpeg / ffprobe / yt-dlp.

For a complete parameter reference, see CLI Capability Summary.

Agent Skills

The repository also includes ready-to-use Agent Skills under skills/ so coding agents can call the CLI with stable conventions:

Based on the provided configuration file, here is the updated "Configuration Help (Must Read)" section for your README file:

Configuration Help (Must Read)

The configuration file is divided into several sections: [app], [server], [llm], [transcribe], and [tts]. A task is composed of speech recognition (transcribe) + large model translation (llm) + optional voice services (tts). Understanding this will help you better grasp the configuration file.

Easiest and Quickest Configuration:

For Subtitle Translation Only: * In the [transcribe] section, set provider.name to openai.

Extension points exported contracts — how you extend this code

Transcriber (Interface)
(no doc) [6 implementers]
internal/types/interface.go
TimestampMatcher (Interface)
TimestampMatcher defines the interface for different language timestamp matching algorithms [1 implementers]
internal/service/timestamps.go
StageService (Interface)
(no doc) [3 implementers]
internal/pipeline/service_adapter.go
Runner (Interface)
(no doc) [2 implementers]
internal/updater/updater.go
Queue (Interface)
(no doc)
pkg/util/queue.go
Ttser (Interface)
(no doc) [4 implementers]
internal/types/interface.go
DurationEstimator (Interface)
(no doc) [2 implementers]
internal/service/dubbing/estimator.go
ChatCompleter (Interface)
(no doc) [3 implementers]
internal/types/interface.go

Core symbols most depended-on inside this repo

GetLogger
called by 443
log/zap.go
Error
called by 417
internal/pipeline/types.go
Close
called by 91
pkg/aliyun/tts.go
createSentenceFromWords
called by 41
internal/service/youtube_subtitle.go
StyledEntry
called by 29
internal/desktop/components.go
CleanPunction
called by 28
pkg/util/base.go
Parse
called by 22
internal/cli/commands.go
CountEffectiveChars
called by 20
pkg/util/subtitle.go

Shape

Function 605
Method 304
Struct 144
Interface 11
TypeAlias 8
FuncType 5

Languages

Go100%

Modules by API surface

internal/service/youtube_subtitle.go84 symbols
internal/service/srt_embed.go49 symbols
internal/subtitle_style/style.go37 symbols
internal/desktop/subtitle.go34 symbols
internal/desktop/components.go33 symbols
internal/cli/commands.go33 symbols
internal/service/audio2subtitle.go27 symbols
pkg/util/subtitle.go25 symbols
internal/cli/commands_test.go25 symbols
internal/updater/updater.go19 symbols
internal/pipeline/service_adapter.go19 symbols
internal/desktop/ui.go19 symbols

Dependencies from manifests, versioned

fyne.io/fyne/v2v2.5.4 · 1×
fyne.io/systrayv1.11.0 · 1×
github.com/aliyun/alibaba-cloud-sdk-gov1.63.72 · 1×
github.com/aliyun/alibabacloud-oss-go-sdk-v2v1.1.3 · 1×
github.com/cloudwego/base64xv0.1.4 · 1×
github.com/cloudwego/iasmv0.2.0 · 1×
github.com/fredbi/uriv1.1.0 · 1×

For agents

$ claude mcp add KrillinAI \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact