hub / github.com/speaches-ai/speaches

github.com/speaches-ai/speaches @v0.8.3 sqlite

repository ↗ · DeepWiki ↗ · release v0.8.3 ↗

512 symbols 1,984 edges 71 files 30 documented · 6%

README

[!NOTE] This project was previously named faster-whisper-server. I've decided to change the name from faster-whisper-server, as the project has evolved to support more than just ASR.

Speaches

speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

See the documentation for installation instructions and usage: speaches.ai

Features:

OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with speaches.
Audio generation (chat completions endpoint) | OpenAI Documentation
Generate a spoken audio summary of a body of text (text in, audio out)
Perform sentiment analysis on a recording (audio in, text out)
Async speech to speech interactions with a model (audio in, audio out)
Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
Text-to-Speech via kokoro(Ranked #1 in the TTS Arena) and piper models.
GPU and CPU support.
Deployable via Docker Compose / Docker
Realtime API
Highly configurable

Please create an issue if you find a bug, have a question, or a feature suggestion.

Demos

Realtime API

https://github.com/user-attachments/assets/457a736d-4c29-4b43-984b-05cc4d9995bc

(Excuse the breathing lol. Didn't have enough time to record a better demo)

Streaming Transcription

TODO

Speech Generation

https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b

Core symbols most depended-on inside this repo

publish_nowait

called by 44

src/speaches/realtime/pubsub.py

append

called by 31

src/speaches/realtime/input_audio_buffer.py

include_router

called by 16

src/speaches/realtime/event_router.py

extend

called by 14

src/speaches/audio.py

srt_format_timestamp

called by 12

src/speaches/text_utils.py

vtt_format_timestamp

called by 12

src/speaches/text_utils.py

list_local_models

called by 10

src/speaches/executors/piper/utils.py

strip_markdown_emphasis

called by 9

src/speaches/text_utils.py

Shape

Function 197

Class 143

Method 136

Route 36

Languages

Python99%

TypeScript1%

Modules by API surface

src/speaches/types/realtime.py57 symbols

src/speaches/types/chat.py30 symbols

src/speaches/text_utils.py21 symbols

tests/api_chat_test.py16 symbols

src/speaches/routers/models.py16 symbols

src/speaches/realtime/response_event_router.py16 symbols

src/speaches/routers/realtime/rtc.py14 symbols

src/speaches/routers/chat.py14 symbols

src/speaches/realtime/message_manager.py14 symbols

src/speaches/hf_utils.py14 symbols

src/speaches/realtime/input_audio_buffer_event_router.py13 symbols

tests/conftest.py12 symbols

Dependencies from manifests, versioned

ctranslate24.5.0 · 1×

fastapi0.115.6 · 1×

faster-whisper1.1.1 · 1×

httpx0.27.2 · 1×

typer0.12.5 · 1×

For agents

$ claude mcp add speaches \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact