hub / github.com/steipete/summarize

github.com/steipete/summarize @v0.21.2 sqlite

repository ↗ · DeepWiki ↗ · release v0.21.2 ↗

4,378 symbols 14,471 edges 1,242 files 3 documented · 0%

README

Summarize 📝 — Chrome Side Panel + CLI

Fast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.

Highlights

Chrome Side Panel chat (streaming agent + history) inside the sidebar.
Video slides: screenshots + OCR + transcript cards for YouTube, direct video URLs, and local video files.
Media-aware summaries: auto‑detect video/audio vs page content.
Coding CLI backends: Codex, Claude, Gemini, Cursor Agent, OpenClaw, OpenCode.
Streaming Markdown + metrics + cache‑aware status.
CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs.

Feature overview

URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS.
Slide extraction for video sources (YouTube, direct video URLs, local video files) with OCR + timestamped cards.
Transcript-first media flow: published transcripts when available, then Groq/ONNX/whisper.cpp/AssemblyAI/Gemini/OpenAI/FAL/Deepgram transcription fallback when not.
Coding CLI providers: Claude, Codex, Gemini, Cursor Agent, OpenClaw, OpenCode, GitHub Copilot, Antigravity, pi.
Streaming output with Markdown rendering, metrics, and cache-aware status.
Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset.
Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.
Smart default: if content is shorter than the requested length, we return it as-is (use --force-summary to override).

Get the extension (recommended)

Summarize extension screenshot

One‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.

Chrome Web Store: Summarize Side Panel

YouTube slide screenshots (from the browser):

Summarize YouTube slide screenshots

Beginner quickstart (extension)

Install the extension (Chrome Web Store link above) and open the Side Panel.
Choose Direct or Daemon. Direct uses Gemini Nano by default when no provider key is configured, or calls your selected provider from Chrome.
Choose Browser media for daemonless transcription/slides. Optional: install the CLI and pair the daemon for native tools, CLI model fallbacks, OCR, and broader media support:
npm (cross-platform): npm i -g @steipete/summarize
Homebrew (Homebrew/core): brew install summarize
summarize daemon install --token <TOKEN> --port 8787

Why a daemon/service?

Direct mode works without the daemon. Auto uses a configured OpenAI, OpenRouter, Anthropic, Gemini, xAI, Z.AI, NVIDIA, MiniMax, GitHub Models, or Ollama provider, otherwise Gemini Nano on-device; keys remain in extension-local storage.
The optional daemon adds CLI model fallbacks, shared caches/diagnostics, native ffmpeg, configurable transcription providers, OCR, and broader media support. Chrome reaches it through an explicitly enabled Native Messaging host; retained loopback network access is used only by configured Direct local providers.
The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready.

If you only want the CLI, you can skip the daemon install entirely.

Notes:

Summarization only runs when the Side Panel is open.
Auto mode summarizes on navigation (incl. SPAs); otherwise use the button.
Daemon is localhost-only and requires a shared token; rerunning summarize daemon install --token <TOKEN> adds another paired browser token instead of invalidating the old one.
Chrome local-companion access is optional and browser-policy enforceable; Direct and Browser modes do not require it.
Non-default port: install with summarize daemon install --token <TOKEN> --port <PORT>, then set the same value in Options → Runtime → Daemon → Port.
Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).
Windows containers: summarize daemon install starts the daemon for the current container session but does not register a Scheduled Task. Chrome Daemon mode also needs the pending packaged Windows native-host executable; Direct and Browser modes remain available.
Tip: configure free via summarize refresh-free (needs OPENROUTER_API_KEY). Add --set-default to set model=free.

Step-by-step install: apps/chrome-extension/README.md
Architecture + troubleshooting: docs/chrome-extension.md
Firefox compatibility notes: apps/chrome-extension/docs/firefox.md

Slides (extension)

Select Video + Slides in the Summarize picker.
Slides render at the top; expand to full‑width cards with timestamps.
Click a slide to seek the video; toggle Transcript/OCR when OCR is significant.
Browser mode uses MediaBunny with native WebCodecs and ranged network reads for fetchable videos, then falls back to visible-tab capture when the source or codec is unavailable.
Daemon mode adds yt-dlp, native ffmpeg, and optional tesseract OCR.

Advanced (unpacked / dev)

Build + load the extension (unpacked):
Chrome: pnpm -C apps/chrome-extension build
- chrome://extensions → Developer mode → Load unpacked
- Pick: apps/chrome-extension/.output/chrome-mv3
Firefox: pnpm -C apps/chrome-extension build:firefox
- about:debugging#/runtime/this-firefox → Load Temporary Add-on
- Pick: apps/chrome-extension/.output/firefox-mv3/manifest.json
Open Side Panel/Sidebar → copy token.
Install daemon in dev mode:
pnpm summarize daemon install --token <TOKEN> --dev --extension-id <UNPACKED_ID>

CLI

Summarize CLI screenshot

Install

Requires Node 24+.

npx (no install):

npx -y @steipete/summarize "https://example.com"

npm (global):

npm i -g @steipete/summarize

npm (library / minimal deps):

npm i @steipete/summarize-core

import { createLinkPreviewClient } from "@steipete/summarize-core/content";

Homebrew:

brew install summarize

Homebrew ships from homebrew/core via brew install summarize. If Homebrew is unavailable in your environment, use the npm global install above.

Optional local dependencies

Install these if you want media-heavy features:

ffmpeg: optional native accelerator with broader codec support; bundled WebAssembly is the fallback
yt-dlp: required for YouTube slide extraction and some remote media flows
tesseract: optional OCR for --slides-ocr
Optional cloud transcription providers:
GROQ_API_KEY
ASSEMBLYAI_API_KEY
ELEVENLABS_API_KEY (speaker diarization)
GEMINI_API_KEY / GOOGLE_GENERATIVE_AI_API_KEY / GOOGLE_API_KEY
OPENAI_API_KEY
FAL_KEY
DEEPGRAM_API_KEY

macOS (Homebrew):

brew install ffmpeg yt-dlp
brew install tesseract # optional, for --slides-ocr

If native ffmpeg/ffprobe are unavailable, Summarize uses the bundled WebAssembly build. Native ffmpeg remains recommended for speed and broader codec/filter support.

CLI vs extension

CLI only: just install via npm/Homebrew and run summarize ... (no daemon needed).
Chrome extension: Direct mode defaults to Gemini Nano without a key and supports provider-backed summaries/chat/automation/hover when configured; Browser media provides daemonless transcription and slides. Install the daemon for CLI fallbacks and native media tools.
Firefox extension: install the CLI and daemon for media extraction.

Quickstart

summarize "https://example.com"

Inspect the effective model setup. Status only lists configured or usable providers; it never prints keys or missing-provider noise.

summarize status
summarize status --verbose
summarize status --probe
summarize status --json

--probe checks supported model-list endpoints without running paid inference. CLI providers are reported as available when their enabled executable is present; API providers are reported as configured when an effective key is present.

Inputs

URLs or local paths:

summarize "/path/to/file.pdf" --model google/gemini-3-flash
summarize "https://example.com/report.pdf" --model google/gemini-3-flash
summarize "/path/to/audio.mp3"
summarize "/path/to/video.mp4"

Stdin (pipe content using -):

echo "content" | summarize -
pbpaste | summarize -
# binary stdin also works (PDF/image/audio/video bytes)
cat /path/to/file.pdf | summarize -

Notes:

Stdin has a 50MB size limit
The - argument tells summarize to read from standard input
Text stdin is treated as UTF-8 text (whitespace-only input is rejected as empty)
Binary stdin is preserved as raw bytes and file type is auto-detected when possible
Useful for piping clipboard content or command output

YouTube (supports youtube.com and youtu.be):

summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto

Podcast RSS (transcribes latest enclosure):

summarize "https://feeds.npr.org/500005/podcast.xml"

Apple Podcasts episode page:

summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"

Spotify episode page (best-effort; may fail for exclusives):

summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"

HLS playlist:

summarize "https://example.com/master.m3u8"

Output length

--length controls how much output we ask for (guideline), not a hard cap. The built-in default is long.

Set a default in ~/.summarize/config.json with output.length.

summarize "https://example.com" --length long
summarize "https://example.com" --length 20k

Presets: short|medium|long|xl|xxl
Character targets: 1500, 20k, 20000
Optional hard cap: --max-output-tokens <count> (e.g. 2000, 2k)
Provider/model APIs still enforce their own maximum output limits.
If omitted, no max token parameter is sent (provider default).
Prefer --length unless you need a hard cap.
Short content: when extracted content is shorter than the requested length, the CLI returns the content as-is.
Override with --force-summary to always run the LLM.
Minimums: --length numeric values must be >= 10 chars; --max-output-tokens must be >= 16.
Preset targets (source of truth: packages/core/src/prompts/summary-lengths.ts):
short: target ~900 chars (range 600-1,200)
medium: target ~1,800 chars (range 1,200-2,500)
long: target ~4,200 chars (range 2,500-6,000)
xl: target ~9,000 chars (range 6,000-14,000)
xxl: target ~17,000 chars (range 14,000-22,000)

What file types work?

Best effort and provider-dependent. These usually work well:

text/* and common structured text (.txt, .md, .json, .yaml, .xml, ...)
Text-like files are inlined into the prompt for better provider compatibility.
PDFs: application/pdf (provider support varies; Google is the most reliable here)
Images: image/jpeg, image/png, image/webp, image/gif
Audio/Video: audio/*, video/* (local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model)

Notes:

If a provider rejects a media type, the CLI fails fast with a friendly message.
xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those.

Model ids

Use gateway-style ids: <provider>/<model>.

Examples:

openai/gpt-5.4
openai/gpt-5.4-mini
openai/gpt-5.4-nano
openai/gpt-5-mini
openai/gpt-5-nano
github-copilot/gpt-5.4
anthropic/claude-sonnet-4-5
xai/grok-4-fast-non-reasoning
google/gemini-3-flash
zai/glm-4.7
minimax/MiniMax-M3
openrouter/openai/gpt-5-mini (force OpenRouter)

Note: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider). gpt-5.4-mini and gpt-5.4-nano are treated as real model ids; the same shorthand also works under github-copilot/....

OpenAI fast mode and thinking

Fast mode is a request option, not a model id:

summarize "https://example.com" --model openai/gpt-5.5 --fast --thinking medium
summarize "https://example.com" --model openai/gpt-5.4 --service-tier fast --thinking low

--fast is shorthand for --service-tier fast.
--service-tier default|fast|priority|flex controls OpenAI service tier. fast is the summarize/Codex-facing spelling and is sent to OpenAI as service_tier="priority".
--thinking none|low|medium|high|xhigh controls OpenAI reasoning effort. Aliases: off → none, min → low, mid / med → medium, x-high / extra-high → xhigh.
--service-tier default clears a configured tier for one run.

Config equivalent:

{
  "model": "openai/gpt-5.5",
  "openai": {
    "serviceTier": "fast",
    "thinking": "medium"
  }
}

Compatibility aliases still work, but prefer the explicit flags above:

--model gpt-fast / --model fast → openai/gpt-5.5 + fast tier + medium thinking
--model openai/gpt-5.5-fast → openai/gpt-5.5 + fast tier

Limits

Text inputs over 10 MB are rejected before tokenization.
Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer.

Common flags

summarize <input> [flags]

Use summarize --help or summarize help for the full help text.

--model <provider/model>: which model to use (defaults to auto)
--model auto: automatic model selection + fallback (default)
--model <name>: use a built-in or config-defined preset (see Configuration)

Extension points exported contracts — how you extend this code

TranscriptCacheGetResult (Interface)

(no doc)

packages/core/src/content/cache/types.ts

Window (Interface)

(no doc)

apps/chrome-extension/src/entrypoints/automation.content.ts

TranscriptCacheSetArgs (Interface)

(no doc)

packages/core/src/content/cache/types.ts

TranscriptCache (Interface)

(no doc)

packages/core/src/content/cache/types.ts

MediaCache (Interface)

(no doc)

packages/core/src/content/cache/types.ts

CacheReadArguments (Interface)

(no doc)

packages/core/src/content/transcript/cache.ts

Core symbols most depended-on inside this repo

get

called by 397

packages/core/src/content/cache/types.ts

packages/core/src/content/cache/types.ts

resolve

called by 109

src/llm/providers/openai/sse.ts

apps/chrome-extension/tests/helpers/extension-harness.ts

closeExtension

called by 87

apps/chrome-extension/tests/helpers/extension-harness.ts

getBrowserFromProject

called by 86

apps/chrome-extension/tests/helpers/extension-harness.ts

Shape

Function 4,208

Method 94

Class 42

Interface 34

Languages

TypeScript100%

Modules by API surface

apps/chrome-extension/tests/helpers/extension-harness.ts31 symbols

scripts/build-docs-site.mjs30 symbols

apps/chrome-extension/src/lib/settings.ts29 symbols

src/daemon/schtasks.ts28 symbols

src/cache.ts26 symbols

apps/chrome-extension/src/entrypoints/sidepanel/slides-summary-controller.ts26 symbols

src/refresh-free/presentation.ts25 symbols

src/content/asset.ts25 symbols

apps/chrome-extension/src/entrypoints/sidepanel/chat-controller.ts25 symbols

src/slides/scene-detection.ts24 symbols

scripts/docs-site-markdown.mjs24 symbols

apps/chrome-extension/src/entrypoints/hover.content.ts24 symbols

Dependencies from manifests, versioned

@earendil-works/pi-ai0.80.2 · 1×

@huggingface/transformers4.2.0 · 1×

@mozilla/readability0.6.0 · 1×

@playwright/test1.61.1 · 1×

@steipete/summarize-coreworkspace:* · 1×

@types/chrome0.2.0 · 1×

@types/node24.13.2 · 1×

@types/sanitize-html2.16.1 · 1×

@typescript/native-preview7.0.0-dev.20260626.1 · 1×

@vitest/coverage-v84.1.9 · 1×

bpe-lite0.5.2 · 1×

commander15.0.0 · 1×

For agents

$ claude mcp add summarize \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/steipete/summarize @v0.21.2 sqlite

Summarize 📝 — Chrome Side Panel + CLI

Highlights

Feature overview

Get the extension (recommended)

Beginner quickstart (extension)

Slides (extension)

Advanced (unpacked / dev)

CLI

Install

Optional local dependencies

CLI vs extension

Quickstart

Inputs

Output length

What file types work?

Model ids

OpenAI fast mode and thinking

Limits

Common flags

--model <name>: use a built-in or config-defined preset (see Configuration)

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents

`--model <name>`: use a built-in or config-defined preset (see Configuration)