<img src="https://github.com/murtaza-nasir/speakr/raw/v0.8.7.2/static/img/icon-32x32.png" alt="Speakr Logo" width="32"/>
Self-hosted AI transcription and intelligent note-taking platform
Documentation • Quick Start • Screenshots • Docker Hub • Releases
Speakr transforms your audio recordings into organized, searchable, and intelligent notes. Built for privacy-conscious groups and individuals, it runs entirely on your own infrastructure, ensuring your sensitive conversations remain completely private.
<img src="https://github.com/murtaza-nasir/speakr/raw/v0.8.7.2/docs/assets/images/screenshots/Main view.png" alt="Speakr Main Interface" width="750"/>
Different people use Speakr's collaboration and retention features in different ways:
| Use Case | Setup | What It Does |
|---|---|---|
| Family memories | Create "Family" group with protected tag | Everyone gets access to trips and events automatically, recordings preserved forever |
| Book club discussions | "Book Club" group, tag monthly meetings | All members auto-share discussions, can add personal notes about what resonated |
| Work project group | Share individually with 3 teammates | Temporary collaboration, easy to revoke when project ends |
| Daily group standups | Group tag with 14-day retention | Auto-share with group, auto-cleanup of routine meetings |
| Architecture decisions | Engineering group tag, protected from deletion | Technical discussions automatically shared, preserved permanently as reference |
| Client consultations | Individual share with view-only permission | Controlled external access, clients can't accidentally edit |
| Research interviews | Protected tag + Obsidian export | Preserve recordings indefinitely, transcripts auto-import to note-taking system |
| Legal consultations | Group tag with 7-year retention | Automatic sharing with legal group, compliance-based retention |
| Sales calls | Group tag with 1-year retention | Whole sales group learns from each call, cleanup after sales cycle |
Tags with custom prompts transform raw recordings into exactly what you need:
Stack multiple tags to layer instructions: - "Recipe" + "Gluten Free" = Formatted recipe with gluten substitution suggestions - "Lecture" + "Biology 301" = Study notes format focused on biological terminology - "Client Meeting" + "Legal Review" = Client requirements plus legal implications highlighted
The order can matter - start with format tags, then add focus tags for best results.
# Create project directory
mkdir speakr && cd speakr
# Download docker-compose configuration:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml
# Download the environment template:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.transcription.example -O .env
# Configure your API keys and launch
nano .env
docker compose up -d
# Access at http://localhost:8899
Required API Keys:
- TRANSCRIPTION_API_KEY - For speech-to-text (OpenAI) or ASR_BASE_URL for self-hosted
- TEXT_MODEL_API_KEY - For summaries, titles, and chat (OpenRouter or OpenAI)
Speakr uses a connector-based architecture that auto-detects your transcription provider:
| Option | Setup | Speaker Diarization | Voice Profiles |
|---|---|---|---|
| OpenAI Transcribe | Just API key | ✅ gpt-4o-transcribe-diarize |
❌ |
| WhisperX ASR | GPU container | ✅ Best quality | ✅ |
| Legacy Whisper | Just API key | ❌ | ❌ |
Simplest setup (OpenAI with diarization):
TRANSCRIPTION_API_KEY=sk-your-openai-key
TRANSCRIPTION_MODEL=gpt-4o-transcribe-diarize
Best quality (Self-hosted WhisperX):
ASR_BASE_URL=http://whisperx-asr:9000
ASR_RETURN_SPEAKER_EMBEDDINGS=true # Enable voice profiles
Requires WhisperX ASR Service container with GPU.
⚠️ PyTorch 2.6 Users: If you encounter a "Weights only load failed" error with WhisperX, add
TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=trueto your ASR container. See troubleshooting for details.
View Full Installation Guide →
Complete documentation is available at murtaza-nasir.github.io/speakr
Export Templates & Localization
{{title}}, {{summary}}, {{notes}}) and conditionals for optional sections{{label.metadata}}, {{label.summary}} etc. for automatically translated labels based on user's UI languageImprovements - Opt-in ASR chunking, speaker ID remapping across chunks, simplified About page transcription display
Bug Fixes - ASR empty text validation, cascade delete for recording relationships, missing model imports
Folders & Automation
Improvements - Legacy ASR code removed (fully migrated to connector architecture), audio codec fallback to MP3, share page click-to-seek, new READABLE_PUBLIC_LINKS option for server-rendered transcripts (LLM/scraper accessible)
Bug Fixes - PostgreSQL boolean defaults in migrations, folders feature detection, audio player visibility for incognito recordings
Incognito Mode Enhancements & Compatibility Fixes
INCOGNITO_MODE_DEFAULT=true option to start with incognito enabled by defaultENABLE_STREAM_OPTIONS=false option for LLM servers that don't support OpenAI's stream_options parameterBulk Operations & Privacy Features
ENABLE_INCOGNITO_MODE=true)Bug Fixes - Fixed language selection not being passed to ASR service, improved reprocess modal
Naming Templates
{{ai_title}}, {{filename}}, {{date}} and custom regex patterns{{ai_title}} skip the AI call entirely/api/v1/upload endpoint for programmatic recording uploadsImprovements - Tag drag-and-drop reordering, registration domain restriction, event delete button, WebM seeking fix
Transcription Usage Tracking
Bug Fixes
gpt-4o-transcribe-diarizeCloud Diarization & REST API
gpt-4o-transcribe-diarize for speaker identification with just an API keypsycopg2-binary driver for PostgreSQL $ claude mcp add speakr \
-- python -m otcore.mcp_server <graph>