Have a natural, spoken conversation with an AI!
This project lets you chat with a Large Language Model (LLM) using just your voice, receiving spoken responses in near real-time. Think of it as your own digital conversation partner.
https://github.com/user-attachments/assets/16cc29a7-bec2-4dd0-a056-d213db798d8f
(early preview - first reasonably stable version)
❗ Project Status: Community-Driven
This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.
I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!
A sophisticated client-server system built for low-latency interaction:
RealtimeSTT rapidly converts your speech to text.RealtimeTTS.turndetect.py) adapts to the conversation pace.llm_module.py).audio_module.py).RealtimeSTT (Speech-to-Text)RealtimeTTS (Text-to-Speech)transformers (Turn detection, Tokenization)torch / torchaudio (ML Framework)ollama / openai (LLM Clients)numpy, scipyThis project leverages powerful AI models, which have some requirements:
install.bat) is for Windows. Manual steps are possible on Linux/macOS but may require more troubleshooting (especially for DeepSpeed).OPENAI_API_KEY environment variable (e.g., in a .env file or passed to Docker).Clone the repository first:
git clone https://github.com/KoljaB/RealtimeVoiceChat.git
cd RealtimeVoiceChat
Now, choose your adventure:
🚀 Option A: Docker Installation (Recommended for Linux/GPU)
This is the most straightforward method, bundling the application, dependencies, and even Ollama into manageable containers.
Build the Docker images:
(This takes time! It downloads base images, installs Python/ML dependencies, and pre-downloads the default STT model.)
bash
docker compose build
(If you want to customize models/settings in code/*.py, do it before this step!)
Start the services (App & Ollama):
(Runs containers in the background. GPU access is configured in docker-compose.yml.)
bash
docker compose up -d
Give them a minute to initialize.
(Crucial!) Pull your desired Ollama Model: (This is done after startup to keep the main app image smaller and allow model changes without rebuilding. Execute this command to pull the default model into the running Ollama container.) ```bash # Pull the default model (adjust if you configured a different one in server.py) docker compose exec ollama ollama pull hf.co/bartowski/huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-GGUF:Q4_K_M
docker compose exec ollama ollama list ```
Stopping the Services:
bash
docker compose down
Restarting:
bash
docker compose up -d
Viewing Logs / Debugging:
docker compose logs -f appdocker compose logs -f ollamadocker compose logs app > app_logs.txt🛠️ Option B: Manual Installation (Windows Script / venv)
This method requires managing the Python environment yourself. It offers more direct control but can be trickier, especially regarding ML dependencies.
B1) Using the Windows Install Script:
batch
install.bat
(This opens a new command prompt within the activated virtual environment.)
Proceed to the "Running the Application" section.B2) Manual Steps (Linux/macOS/Windows):
Create & Activate Virtual Environment:
bash
python -m venv venv
# Linux/macOS:
source venv/bin/activate
# Windows:
.\venv\Scripts\activate
Upgrade Pip:
bash
python -m pip install --upgrade pip
Navigate to Code Directory:
bash
cd code
Install PyTorch (Crucial Step - Match Your Hardware!):
bash
# Verify your CUDA version! Adjust 'cu121' and the URL if needed.
pip install torch==2.5.1+cu121 torchaudio==2.5.1+cu121 torchvision --index-url https://download.pytorch.org/whl/cu121bash
# pip install torch torchaudio torchvisionInstall Other Requirements:
bash
pip install -r requirements.txt
requirements.txt may include DeepSpeed. Installation can be complex, especially on Windows. The install.bat tries a precompiled wheel. If manual installation fails, you might need to build it from source or consult resources like deepspeedpatcher (use at your own risk). Coqui TTS performance benefits most from DeepSpeed.If using Docker:
Your application is already running via docker compose up -d! Check logs using docker compose logs -f app.
If using Manual/Script Installation:
bash
# Linux/macOS: source ../venv/bin/activate
# Windows: ..\venv\Scripts\activatecode directory (if not already there):
bash
cd codebash
python server.pyAccessing the Client (Both Methods):
http://localhost:8000 (or your server's IP if running remotely/in Docker on another machine).Want to tweak the AI's voice, brain, or how it listens? Modify the Python files in the code/ directory.
⚠️ Important Docker Note: If using Docker, make any configuration changes before running docker compose build to ensure they are included in the image.
server.py, audio_module.py):START_ENGINE in server.py to "coqui", "kokoro", or "orpheus".AudioProcessor.__init__ in audio_module.py.server.py, llm_module.py):LLM_START_PROVIDER ("ollama" or "openai") and LLM_START_MODEL (e.g., "hf.co/..." for Ollama, model name for OpenAI) in server.py. Remember to pull the Ollama model if using Docker (see Installation Step A3).system_prompt.txt.transcribe.py):DEFAULT_RECORDER_CONFIG to change the Whisper model (model), language (language), silence thresholds (silence_limit_seconds), etc. The default base.en model is pre-downloaded during the Docker build.turndetect.py):TurnDetector.update_settings method.server.py):USE_SSL = True and provide paths to your certificate (SSL_CERT_PATH) and key (SSL_KEY_PATH) files.docker-compose.yml to map the SSL port (e.g., 443) and potentially mount your certificate files as volumes.Generating Local SSL Certificates (Windows Example w/ mkcert)
1. Install Chocolatey package manager if you haven't already.
2. Install mkcert: `choco install mkcert`
3. Run Command Prompt *as Administrator*.
4. Install a local Certificate Authority: `mkcert -install`
5. Generate certs (replace `your.local.ip`): `mkcert localhost 127.0.0.1 ::1 your.local.ip`
* This creates `.pem` files (e.g., `localhost+3.pem` and `localhost+3-key.pem`) in the current directory. Update `SSL_CERT_PATH` and `SSL_KEY_PATH` in `server.py` accordingly. Remember to potentially mount these into your Docker container.
Got ideas or found a bug? Contributions are welcome! Feel free to open issues or submit pull requests.
The core codebase of this project is released under the MIT License (see the LICENSE file for details).
This project relies on external specific TTS engines (like Coqui XTTSv2) and LLM providers which have their own licensing terms. Please ensure you comply with the licenses of all components you use.
$ claude mcp add RealtimeVoiceChat \
-- python -m otcore.mcp_server <graph>