hub / github.com/KoljaB/RealtimeSTT

github.com/KoljaB/RealtimeSTT @v1.0.2 sqlite

repository ↗ · DeepWiki ↗ · release v1.0.2 ↗

1,729 symbols 6,554 edges 117 files 531 documented · 31%

README

RealtimeSTT

RealtimeSTT is a Python speech-to-text library for applications that need voice activity detection, fast transcription, optional realtime text updates, wake words, and direct access to audio streams. It is designed for assistants, dictation tools, browser streaming servers, and prototypes that need to turn speech into text with only a few lines of code.

The recommended default path uses faster_whisper. Other engines are available through install extras when their optional dependencies and models are present.

Support RealtimeSTT

If RealtimeSTT saved you time, one GitHub star is a simple way to help make it more stable.

Stars improve visibility and visibility brings more users, more real-world testing, more bug reports, more fixes, and better releases for everyone.

Demo

https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5

CLI demo code (reproduces the video above)

Featured Integration: Kroko/Banafo ASR

RealtimeSTT includes native support for kroko_onnx, the local streaming ASR engine from the Kroko/Banafo team.

This integration has been on my wishlist for a long time. Kroko is a strong fit for RealtimeSTT's goals: fast, accurate local speech recognition.

Start with the public Community models for local testing, or see Kroko/Banafo's commercial model options if you need production licensing and higher-end models.

pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
stt-install-kroko --build

The silero-onnx-cpu extra gives AudioToTextRecorder a local VAD backend for recorder-based smoke tests and live microphone use.

See the Kroko-ONNX engine guide, Kroko ASR docs, and kroko-onnx on GitHub.

Install

Use Python 3.11 or newer for the current pinned dependency set.

pip install "RealtimeSTT[faster-whisper]"

On Linux, install PortAudio headers before installing the package:

sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev

On macOS:

brew install portaudio

For CUDA, platform notes, and optional engine stacks, see docs/installation.md.

Microphone Example

This waits for speech, stops after the detected utterance, and prints the final transcript:

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder() as recorder:
        print("Speak now")
        print(recorder.text())

Use the if __name__ == "__main__": guard when running scripts, especially on Windows, because RealtimeSTT uses multiprocessing for model work.

Automatic Recording Loop

For continuous dictation, pass a callback to text() so transcription work can complete asynchronously while your loop keeps listening:

from RealtimeSTT import AudioToTextRecorder


def process_text(text):
    print(text)


if __name__ == "__main__":
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)

External Audio

Set use_microphone=False when audio comes from a file, stream, websocket, or another process. Feed 16-bit mono PCM chunks at 16 kHz, or pass the original sample rate so RealtimeSTT can resample:

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    recorder = AudioToTextRecorder(use_microphone=False)

    with open("audio_chunk.pcm", "rb") as audio_file:
        recorder.feed_audio(audio_file.read(), original_sample_rate=16000)

    print(recorder.text())
    recorder.shutdown()

More examples are in docs/quick-start.md and docs/external-audio.md.

Configuration Reference

Every AudioToTextRecorder constructor parameter is documented in docs/configuration.md, including model/engine selection, realtime transcription, VAD timing, wake words, callbacks, external audio, logging, and executor injection.

Features

Voice activity detection with WebRTC VAD and Silero VAD.
Final and realtime transcription with selectable engines.
Optional wake word activation through Porcupine or OpenWakeWord.
Direct microphone input or application-fed audio chunks.
Event callbacks for recording, VAD, realtime text, transcription, and wake word state.
A FastAPI browser streaming server example with multi-user session isolation, shared inference resources, metrics, and health endpoints.

Documentation

Quick start: shortest demos and common recording patterns.
Installation: platform setup, CUDA notes, and optional dependencies.
Configuration: complete AudioToTextRecorder parameter reference.
Transcription engines: engine selection and setup links.
Wake words: Porcupine and OpenWakeWord setup.
External audio: feeding audio without a microphone.
Testing: maintained unit and opt-in golden test workflow.
Test scripts: demos, manual tests, regressions, and legacy experiments under tests/.
FastAPI server: browser server configuration, protocol, metrics, and deployment notes.
Troubleshooting: common install, audio, CUDA, model, dependency, and runtime errors.
Engine licenses: license notes for optional engine runtimes and model families.

Engine-specific references:

Server Example

The browser FastAPI reference server lives in example_fastapi_server and is intended for source checkouts. It is not installed by the PyPI wheel; keeping it source-only keeps the wheel lean and avoids adding web-server dependencies for users who only need the recorder/API library.

python -m pip install -r example_fastapi_server/requirements.txt
python example_fastapi_server/server.py --host 0.0.0.0 --port 8010

For pip-only installs, use the Python recorder/API examples instead. If you want the FastAPI reference server, clone the repository or install from Git.

Open http://localhost:8010. See docs/fastapi-server.md for engine recipes, websocket protocol details, health checks, and metrics.

Contributing

Focused tests and small changes are easiest to review. The project keeps fast unit tests separate from opt-in real-model tests; see docs/testing.md.

License

MIT

Author

Kolja Beigel

Core symbols most depended-on inside this repo

get

called by 414

example_fastapi_server/server.py

obs

called by 125

tools/evaluate_realtime_text_stabilizer.py

clear

called by 53

example_fastapi_server/server.py

admit_session

called by 30

example_fastapi_server/server.py

send

called by 25

RealtimeSTT/core/safepipe.py

start

called by 24

RealtimeSTT/audio_recorder.py

run_callback

called by 19

RealtimeSTT/core/state.py

connect

called by 18

example_fastapi_server/server.py

Shape

Method 956

Function 546

Class 222

Route 5

Languages

Python100%

TypeScript1%

Modules by API surface

example_fastapi_server/server.py203 symbols

tests/unit/test_additional_transcription_engines.py107 symbols

tests/unit/test_kroko_onnx_engine.py64 symbols

tests/unit/test_fastapi_server_multi_user.py63 symbols

RealtimeSTT/transcription_engines/kroko_onnx_engine.py43 symbols

tests/unit/test_realtime_text_stabilizer_eval.py42 symbols

RealtimeSTT/core/realtime_text_stabilizer.py42 symbols

example_app/ui_openai_voice_interface.py41 symbols

tests/unit/test_omnilingual_asr_engine.py37 symbols

tests/unit/test_whisper_cpp_engine.py36 symbols

tests/unit/test_sherpa_onnx_engine.py33 symbols

tests/unit/test_openai_whisper_engine.py33 symbols

Dependencies from manifests, versioned

PyAudio0.2.14 · 1×

fastapi0.115 · 1×

faster-whisper1.2.1 · 1×

halo0.0.31 · 1×

openwakeword0.6.0 · 1×

pvporcupine1.9.5 · 1×

scipy1.17.1 · 1×

soundfile0.13.1 · 1×

torch2.7.1 · 1×

torchaudio2.7.1 · 1×

webrtcvad-wheels2.0.14 · 1×

websocket-client1.9.0 · 1×

For agents

$ claude mcp add RealtimeSTT \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact