MCPcopy Index your code
hub / github.com/murtaza-nasir/speakr

github.com/murtaza-nasir/speakr @v0.8.7.2 sqlite

repository ↗ · DeepWiki ↗ · release v0.8.7.2 ↗
1,780 symbols 5,600 edges 166 files 732 documented · 41%
README
<img src="https://github.com/murtaza-nasir/speakr/raw/v0.8.7.2/static/img/icon-32x32.png" alt="Speakr Logo" width="32"/>

Speakr

Self-hosted AI transcription and intelligent note-taking platform

AGPL v3 Docker Build Docker Pulls Latest Version

DocumentationQuick StartScreenshotsDocker HubReleases


Overview

Speakr transforms your audio recordings into organized, searchable, and intelligent notes. Built for privacy-conscious groups and individuals, it runs entirely on your own infrastructure, ensuring your sensitive conversations remain completely private.

<img src="https://github.com/murtaza-nasir/speakr/raw/v0.8.7.2/docs/assets/images/screenshots/Main view.png" alt="Speakr Main Interface" width="750"/>

Key Features

Core Functionality

  • Smart Recording & Upload - Record directly in browser or upload existing audio files
  • AI Transcription - High-accuracy transcription with speaker identification
  • Voice Profiles - AI-powered speaker recognition with voice embeddings (requires WhisperX ASR service)
  • REST API v1 - Complete API with Swagger UI for automation tools (n8n, Zapier, Make) and dashboard widgets
  • Single Sign-On - Authenticate with any OIDC provider (Keycloak, Azure AD, Google, Auth0, Pocket ID)
  • Audio-Transcript Sync - Click transcript to jump to audio, auto-highlight current text, follow mode for hands-free playback
  • Interactive Chat - Ask questions about your recordings and get AI-powered answers
  • Inquire Mode - Semantic search across all recordings using natural language
  • Internationalization - Full support for English, Spanish, French, German, Chinese, and Russian
  • Beautiful Themes - Light and dark modes with customizable color schemes

Collaboration & Sharing

  • Internal Sharing - Share recordings with specific users with granular permissions (view/edit/reshare)
  • Group Management - Create groups with automatic sharing via group-scoped tags
  • Public Sharing - Generate secure links to share recordings externally (admin-controlled)
  • Group Tags - Tags that automatically share recordings with all group members

Organization & Management

  • Smart Tagging - Organize with tags that include custom AI prompts and ASR settings
  • Tag Prompt Stacking - Combine multiple tags to layer AI instructions for powerful transformations
  • Tag Protection - Prevent specific recordings from being auto-deleted
  • Group Retention Policies - Set custom retention periods per group tag
  • Auto-Deletion - Automatic cleanup of old recordings with flexible retention policies

Real-World Use Cases

Different people use Speakr's collaboration and retention features in different ways:

Use Case Setup What It Does
Family memories Create "Family" group with protected tag Everyone gets access to trips and events automatically, recordings preserved forever
Book club discussions "Book Club" group, tag monthly meetings All members auto-share discussions, can add personal notes about what resonated
Work project group Share individually with 3 teammates Temporary collaboration, easy to revoke when project ends
Daily group standups Group tag with 14-day retention Auto-share with group, auto-cleanup of routine meetings
Architecture decisions Engineering group tag, protected from deletion Technical discussions automatically shared, preserved permanently as reference
Client consultations Individual share with view-only permission Controlled external access, clients can't accidentally edit
Research interviews Protected tag + Obsidian export Preserve recordings indefinitely, transcripts auto-import to note-taking system
Legal consultations Group tag with 7-year retention Automatic sharing with legal group, compliance-based retention
Sales calls Group tag with 1-year retention Whole sales group learns from each call, cleanup after sales cycle

Creative Tag Prompt Examples

Tags with custom prompts transform raw recordings into exactly what you need:

  • Recipe recordings: Record yourself cooking while narrating - tag with "Recipe" to convert messy speech into formatted recipes with ingredient lists and numbered steps
  • Lecture notes: Students tag lectures with "Study Notes" to get organized outlines with concepts, examples, and definitions instead of raw transcripts
  • Code reviews: "Code Review" tag extracts issues, suggested changes, and action items in technical language developers can use directly
  • Meeting summaries: "Action Items" tag ignores discussion and returns just decisions, tasks, and deadlines

Tag Stacking for Combined Effects

Stack multiple tags to layer instructions: - "Recipe" + "Gluten Free" = Formatted recipe with gluten substitution suggestions - "Lecture" + "Biology 301" = Study notes format focused on biological terminology - "Client Meeting" + "Legal Review" = Client requirements plus legal implications highlighted

The order can matter - start with format tags, then add focus tags for best results.

Integration Examples

  • Obsidian/Logseq: Enable auto-export to write completed transcripts directly to your vault using your custom template - no manual export needed
  • Documentation wikis: Map auto-export to your wiki's import folder for seamless transcript publishing
  • Content creation: Create SRT subtitle templates from your audio recordings for podcasts or video content
  • Project management: Extract action items with custom tag prompts, then auto-export for automated task creation

Quick Start

Using Docker (Recommended)

# Create project directory
mkdir speakr && cd speakr

# Download docker-compose configuration:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml

# Download the environment template:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.transcription.example -O .env

# Configure your API keys and launch
nano .env
docker compose up -d

# Access at http://localhost:8899

Required API Keys: - TRANSCRIPTION_API_KEY - For speech-to-text (OpenAI) or ASR_BASE_URL for self-hosted - TEXT_MODEL_API_KEY - For summaries, titles, and chat (OpenRouter or OpenAI)

Transcription Options

Speakr uses a connector-based architecture that auto-detects your transcription provider:

Option Setup Speaker Diarization Voice Profiles
OpenAI Transcribe Just API key gpt-4o-transcribe-diarize
WhisperX ASR GPU container ✅ Best quality
Legacy Whisper Just API key

Simplest setup (OpenAI with diarization):

TRANSCRIPTION_API_KEY=sk-your-openai-key
TRANSCRIPTION_MODEL=gpt-4o-transcribe-diarize

Best quality (Self-hosted WhisperX):

ASR_BASE_URL=http://whisperx-asr:9000
ASR_RETURN_SPEAKER_EMBEDDINGS=true  # Enable voice profiles

Requires WhisperX ASR Service container with GPU.

⚠️ PyTorch 2.6 Users: If you encounter a "Weights only load failed" error with WhisperX, add TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=true to your ASR container. See troubleshooting for details.

View Full Installation Guide →

Documentation

Complete documentation is available at murtaza-nasir.github.io/speakr

Latest Release (v0.8.7)

Export Templates & Localization

  • Customizable Export Templates - Create markdown templates for exports with variables ({{title}}, {{summary}}, {{notes}}) and conditionals for optional sections
  • Localized Labels - Use {{label.metadata}}, {{label.summary}} etc. for automatically translated labels based on user's UI language
  • Localized Date Formatting - Export dates formatted per user's language preference (e.g., "15. Januar 2026" for German)

Improvements - Opt-in ASR chunking, speaker ID remapping across chunks, simplified About page transcription display

Bug Fixes - ASR empty text validation, cascade delete for recording relationships, missing model imports

Previous Release (v0.8.6)

Folders & Automation

  • Folders Organization - Organize recordings into folders with custom prompts and ASR settings per folder
  • Auto Speaker Labeling - Automatic speaker identification using voice embedding matching
  • Per-User Auto-Summarization - User-configurable automatic summary generation
  • Azure OpenAI Connector - New transcription connector for Azure OpenAI (experimental, community testing welcome)
  • HTTPS Validation - Clear error messages when attempting to record on non-HTTPS connections

Improvements - Legacy ASR code removed (fully migrated to connector architecture), audio codec fallback to MP3, share page click-to-seek, new READABLE_PUBLIC_LINKS option for server-rendered transcripts (LLM/scraper accessible)

Bug Fixes - PostgreSQL boolean defaults in migrations, folders feature detection, audio player visibility for incognito recordings

Previous Release (v0.8.5.1)

Incognito Mode Enhancements & Compatibility Fixes

  • Incognito Mode for In-App Recordings - The incognito toggle now works for microphone recordings, not just uploads
  • Default Incognito Mode - New INCOGNITO_MODE_DEFAULT=true option to start with incognito enabled by default
  • LLM Streaming Compatibility - New ENABLE_STREAM_OPTIONS=false option for LLM servers that don't support OpenAI's stream_options parameter

Previous Release (v0.8.5)

Bulk Operations & Privacy Features

  • Multi-Select Mode - Select multiple recordings in sidebar for batch operations (delete, tag, reprocess, toggle inbox/highlight)
  • Incognito Mode - Session-only transcription processing with no database storage (enable with ENABLE_INCOGNITO_MODE=true)
  • Playback Speed Control - Adjustable 0.5x to 3x speed on all audio players with persistent preference

Previous Release (v0.8.4)

Bug Fixes - Fixed language selection not being passed to ASR service, improved reprocess modal

Previous Release (v0.8.3)

Naming Templates

  • Custom Title Formatting - Create templates with variables like {{ai_title}}, {{filename}}, {{date}} and custom regex patterns
  • Tag-Based or User Default - Assign templates to tags or set a user-wide default
  • Token Savings - Templates without {{ai_title}} skip the AI call entirely
  • API v1 Upload - New /api/v1/upload endpoint for programmatic recording uploads

Improvements - Tag drag-and-drop reordering, registration domain restriction, event delete button, WebM seeking fix

Previous Release (v0.8.2)

Transcription Usage Tracking

  • Per-User Budgets - Set monthly transcription limits (in minutes) with 80% warning and 100% blocking
  • Usage Dashboard - Track minutes, costs, and per-user breakdowns in Admin panel
  • Cost Estimation - Automatic pricing for OpenAI Whisper/Transcribe and self-hosted ASR

Previous Release (v0.8.1)

Bug Fixes

  • Diarization for Long Files - Fixed speaker diarization for chunked files with OpenAI's gpt-4o-transcribe-diarize
  • Empty Segment Filtering - Removed empty transcript segments from diarized output

Previous Release (v0.8.0)

Cloud Diarization & REST API

  • Speaker Diarization Without GPU - Use OpenAI's gpt-4o-transcribe-diarize for speaker identification with just an API key
  • REST API v1 - Full-featured API for automation tools (n8n, Zapier, Make) and dashboard widgets
  • Connector Architecture - Modular transcription providers with simplified configuration
  • Virtual Scrolling - Performance optimization for handling 4500+ transcript segments smoothly
  • Audio Player Improvements - Drag-to-seek, independent modal players, improved theme support
  • File Date Handling - Uses original recording date from file metadata instead of upload time
  • Codec Configuration - Configure unsupported audio codecs with automatic conversion

Previous Release (v0.7.1)

  • PostgreSQL Support - Added psycopg2-binary driver for PostgreSQL

Core symbols most depended-on inside this repo

showToast
called by 170
static/js/modules/utils/toast.js
has_recording_access
called by 59
src/app.py
setGlobalError
called by 57
static/js/modules/composables/ui.js
to_dict
called by 57
src/models/user.py
add_column_if_not_exists
called by 56
src/utils/database.py
get_setting
called by 30
src/models/system.py
get_registry
called by 25
src/services/transcription/registry.py
apiRequest
called by 19
static/js/utils/apiClient.js

Shape

Function 1,272
Method 226
Route 214
Class 68

Languages

Python57%
TypeScript43%

Modules by API surface

static/js/modules/composables/ui.js107 symbols
src/api/recordings.py94 symbols
src/api/api_v1.py71 symbols
src/api/admin.py59 symbols
static/js/modules/composables/speakers.js47 symbols
static/js/app.modular.js45 symbols
src/api/auth.py43 symbols
static/js/modules/composables/modals.js32 symbols
static/js/modules/composables/upload.js31 symbols
static/js/modules/composables/audioPlayer.js28 symbols
src/api/shares.py28 symbols
static/js/modules/composables/transcription.js26 symbols

Dependencies from manifests, versioned

Babel2.12.1 · 1×
Flask-Limiter3.5.0 · 1×
Pillow10.1.0 · 1×
authlib1.3.0 · 1×
bleach6.1.0 · 1×
cairosvg2.7.1 · 1×
email-validator2.2.0 · 1×
flask2.3.3 · 1×
flask-bcrypt1.0.1 · 1×
flask-login0.6.3 · 1×
flask-openapi33.0.0 · 1×
flask-sqlalchemy3.1.1 · 1×

Datastores touched

database_nameDatabase · 1 repos

For agents

$ claude mcp add speakr \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact