The open-source alternative to Opus Clip, Vidyo.ai, Klap, SubMagic, 2short.ai, and other AI clipping tools. Drop in any long-form YouTube video and get back ranked, viral-ready 9:16 shorts — for free, with no per-clip credits, no watermarks, and full control over the highlight algorithm.
Built for creators, agencies, and developers who don't want to pay $20–$300/month or be capped on minutes processed. Uses GPT-class LLM highlight detection and Whisper transcription to extract the most viral-worthy moments and auto-crop them vertically for TikTok, Reels, and Shorts.
Building your own Opus Clip–style SaaS? Skip the infra and ship on the same APIs that power this repo: - AI Clipping API — end-to-end clip selection + render - Auto-Crop API — vertical reframing only
| This repo | Opus Clip / Vidyo.ai / Klap / SubMagic | |
|---|---|---|
| Price | Free + open source (pay only for API usage) | $20–$300/month subscriptions |
| Per-clip credits | None — process unlimited videos | Monthly minute caps, overage fees |
| Watermarks | Never | On free tiers |
| Highlight algorithm | Fully editable virality framework | Black box |
| Output format | Any aspect ratio, any resolution | Locked presets |
| Batch processing | xargs an entire URL list |
Manual upload one-by-one |
| JSON / API output | Built-in (--output-json) |
Limited or paid tier only |
| Self-hostable | Yes — runs on your machine or server | SaaS only, your videos sit on their servers |
| White-label / embeddable | Yes — MIT licensed, import as Python lib | No |
--mode api uses MuAPI for download/transcription/cropping; --mode local runs entirely on your machine with yt-dlp, faster-whisper, and ffmpeg/opencv, and lets you pick OpenAI or Gemini for highlight ranking/openai-whisper via MuAPI) or local (faster-whisper, CPU or CUDA) — same downstream output shapegenerate_shorts(...) into your own pipeline--output-json dumps the full result (transcript + every candidate highlight + final clip URLs/paths) for downstream automationDon't want to self-host? The AI Clipping API gives you the same Opus Clip–style pipeline as a single HTTP call — no Python, no dependencies, pay-per-clip instead of monthly subscriptions.
--mode local): ffmpeg on your PATH and an LLM API key (OPENAI_API_KEY or GEMINI_API_KEY; only the LLM step is remote)Clone the repository:
bash
git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git
cd AI-Youtube-Shorts-Generator
Create and activate a virtual environment:
bash
python3.10 -m venv venv
source venv/bin/activate
Install Python dependencies:
bash
pip install -r requirements.txt
# Only if you plan to use --mode local:
pip install -r requirements-local.txt
Set up environment variables:
Create a .env file in the project root:
```bash
# API mode (default)
MUAPI_API_KEY=your_muapi_key_here
# Local mode (--mode local) LLM_PROVIDER=openai # openai or gemini OPENAI_API_KEY=your_openai_key_here OPENAI_MODEL=gpt-4o-mini # optional, default gpt-4o-mini GEMINI_API_KEY=your_gemini_key_here GEMINI_MODEL=gemini-2.5-flash # optional, default gemini-2.5-flash LOCAL_WHISPER_MODEL=base # tiny / base / small / medium / large-v3 LOCAL_WHISPER_DEVICE=auto # auto / cpu / cuda LOCAL_OUTPUT_DIR=output # where local mp4s land ```
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"
python main.py "https://www.youtube.com/watch?v=VIDEO_ID" --mode local
Local mode writes the rendered shorts to ./output/short_01.mp4, short_02.mp4, … (override with LOCAL_OUTPUT_DIR).
python main.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--mode api \
--num-clips 5 \
--aspect-ratio 9:16 \
--output-json result.json
In --mode local, you can pass a file:// URL or a direct filesystem path and skip YouTube entirely:
python main.py "/Users/you/Videos/input.mp4" --mode local
python main.py "file:///Users/you/Videos/input.mp4" --mode local
The Python API works the same way:
from shorts_generator import generate_shorts
result = generate_shorts(
"/Users/you/Videos/input.mp4",
num_clips=5,
aspect_ratio="9:16",
mode="local",
)
for short in result["shorts"]:
print(short["score"], short["title"], short["clip_url"])
Local transcription is cached as an .srt file in LOCAL_OUTPUT_DIR using the
video's base name. If the cache already exists and is newer than the source
file, the app reuses it instead of running Whisper again.
Local downloads are also cached in LOCAL_OUTPUT_DIR as
source_<youtube_id>.mp4 when the input is a YouTube URL. If that file already
exists, the app skips yt-dlp and reuses the cached video.
Create a urls.txt file with one URL per line, then:
xargs -a urls.txt -I{} python main.py "{}"
| Flag | Default | Notes |
|---|---|---|
--mode |
api |
api (MuAPI, fast, no setup) or local (remote URL, file://, or local path + faster-whisper + LLM provider + ffmpeg) |
--num-clips |
3 |
How many shorts to render |
--aspect-ratio |
9:16 |
Any ratio; 9:16 for TikTok/Reels, 1:1 for square |
--format |
720 |
Source download resolution: 360 / 480 / 720 / 1080 |
--language |
auto | Force Whisper language code (e.g. en) |
--output-json |
— | Dump the full result (transcript + all candidates) to a file |
| Step | API mode (--mode api) |
Local mode (--mode local) |
|---|---|---|
| Download | MuAPI /youtube-download |
yt-dlp for remote URLs, direct file path for local inputs |
| Transcription | MuAPI /openai-whisper |
faster-whisper (CPU or CUDA) |
| Highlight LLM | MuAPI gpt-5-mini |
LLM_PROVIDER=openai uses OpenAI (gpt-4o-mini by default), LLM_PROVIDER=gemini uses Gemini (gemini-2.5-flash by default) |
| Vertical crop | MuAPI /autocrop |
ffmpeg + OpenCV face tracking |
| Output | hosted URLs | local mp4 paths |
| Required keys | MUAPI_API_KEY |
OPENAI_API_KEY or GEMINI_API_KEY (+ ffmpeg on PATH) |
/openai-whisper produces a timestamped transcript (verbose_json segments)--num-clips candidates are selectedOutput: a list of mp4 URLs plus, for each clip, its title, viral score, hook sentence, and a one-line reason explaining why it should perform.
Console output looks like:
========================================================================
Highlights: 7 candidates → kept top 3
========================================================================
#1 score=92 124.3s → 187.6s
title: The one mistake that cost me $50K
hook: "Nobody talks about this, but it killed my first startup..."
clip: https://.../short_1.mp4
#2 score=88 ...
--output-json result.json produces:
{
"source_video_url": "...",
"transcript": { "duration": 1873.4, "segments": [...] },
"highlights": [ {...}, {...}, ... ],
"shorts": [
{
"title": "...",
"start_time": 124.3,
"end_time": 187.6,
"score": 92,
"hook_sentence": "...",
"virality_reason": "...",
"clip_url": "https://.../short_1.mp4"
}
]
}
Edit shorts_generator/highlights.py:
- Virality framework: VIRALITY_CRITERIA — the ranked list of signals the LLM optimizes for
- System prompt: HIGHLIGHT_SYSTEM_PROMPT — duration sweet spot, hook rules, JSON schema
- Chunk size: CHUNK_SIZE_SECONDS (default 1200) — chunk length for long videos
- Long-video threshold: LONG_VIDEO_THRESHOLD (default 1800) — videos longer than this are chunked
- Chunk overlap: CHUNK_OVERLAP_SECONDS (default 60) — overlap between chunks so cross-boundary clips aren't missed
Edit shorts_generator/config.py (or set env vars):
- MUAPI_POLL_INTERVAL (default 5s) — seconds between job-status polls
- MUAPI_POLL_TIMEOUT (default 1800s) — give up after this long
Audio is transcribed by MuAPI's /openai-whisper endpoint (server-side whisper-1). Pass --language <code> to lock the recognition to a specific language; otherwise it auto-detects.
AI-Youtube-Shorts-Generator/
├── main.py CLI entry point
├── requirements.txt core deps (api mode)
├── requirements-local.txt optional deps for --mode local
├── .env.example
└── shorts_generator/
├── config.py env / settings (MuAPI + local LLM + Whisper)
├── muapi.py generic submit + poll wrapper
├── downloader.py API mode: YouTube download via MuAPI
├── transcriber.py API mode: MuAPI /openai-whisper client
├── highlights.py shared LLM virality ranking (pluggable backend)
├── clipper.py API mode: MuAPI /autocrop
├── pipeline.py mode dispatcher (api ↔ local)
└── local/ --mode local backends (offline)
├── downloader.py yt-dlp download
├── transcriber.py faster-whisper transcription
├── llm.py OpenAI or Gemini client selector
└── clipper.py ffmpeg cut + OpenCV vertical crop
The video may have no detectable speech, or it may be in a language Whisper struggles with. Try passing --language en (or the correct ISO-639-1 code) to skip auto-detection.
The AI Clipping API uses an improved algorithm that produces higher-quality clips with better highlight detection.
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
$ claude mcp add AI-Youtube-Shorts-Generator \
-- python -m otcore.mcp_server <graph>