MCPcopy Index your code
hub / github.com/snakers4/silero-vad

github.com/snakers4/silero-vad @v6.2.1 sqlite

repository ↗ · DeepWiki ↗ · release v6.2.1 ↗
110 symbols 325 edges 20 files 34 documented · 31%
README

Mailing list : test Mailing list : test License: CC BY-NC 4.0 downloads

Open In Colab Test Package Pypi version Python version

header

Silero VAD

Silero VAD - pre-trained enterprise-grade Voice Activity Detector (also see our STT models).

Real Time Example

https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-9be7-004c891dd481.mp4

Please note, that video loads only if you are logged in your GitHub account.

Fast start

Dependencies

System requirements to run python examples on x86-64 systems:

  • python 3.8+;
  • 1G+ RAM;
  • A modern CPU with AVX, AVX2, AVX-512 or AMX instruction sets.

Dependencies:

  • torch>=1.12.0;
  • torchaudio>=0.12.0 (for I/O only);
  • onnxruntime>=1.16.1 (for ONNX model usage).

Silero VAD uses torchaudio library for audio I/O (torchaudio.info, torchaudio.load, and torchaudio.save), so a proper audio backend is required:

  • Option №1 - FFmpeg backend. conda install -c conda-forge 'ffmpeg<7';
  • Option №2 - sox_io backend. apt-get install sox, TorchAudio is tested on libsox 14.4.2;
  • Option №3 - soundfile backend. pip install soundfile.

If you are planning to run the VAD using solely the onnx-runtime, it will run on any other system architectures where onnx-runtume is supported. In this case please note that:

  • You will have to implement the I/O;
  • You will have to adapt the existing wrappers / examples / post-processing for your use-case.

Using pip: pip install silero-vad

from silero_vad import load_silero_vad, read_audio, get_speech_timestamps
model = load_silero_vad()
wav = read_audio('path_to_audio_file')
speech_timestamps = get_speech_timestamps(
  wav,
  model,
  return_seconds=True,  # Return speech timestamps in seconds (default is samples)
)

Using torch.hub:

import torch
torch.set_num_threads(1)

model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad')
(get_speech_timestamps, _, read_audio, _, _) = utils

wav = read_audio('path_to_audio_file')
speech_timestamps = get_speech_timestamps(
  wav,
  model,
  return_seconds=True,  # Return speech timestamps in seconds (default is samples)
)

Key Features

  • Stellar accuracy

Silero VAD has excellent results on speech detection tasks.

  • Fast

One audio chunk (30+ ms) takes less than 1ms to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 4-5x faster.

  • Lightweight

JIT model is around two megabytes in size.

  • General

Silero VAD was trained on huge corpora that include over 6000 languages and it performs well on audios from different domains with various background noise and quality levels.

  • Flexible sampling rate

Silero VAD supports 8000 Hz and 16000 Hz sampling rates.

  • Highly Portable

Silero VAD reaps benefits from the rich ecosystems built around PyTorch and ONNX running everywhere where these runtimes are available.

  • No Strings Attached

Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock.

Typical Use Cases

  • Voice activity detection for IOT / edge / mobile use cases
  • Data cleaning and preparation, voice detection in general
  • Telephony and call-center automation, voice bots
  • Voice interfaces

Links

Get In Touch

Try our models, create an issue, start a discussion, join our telegram chat, email us, read our news.

Please see our wiki for relevant information and email us directly.

Citations

@misc{Silero VAD,
  author = {Silero Team},
  title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-vad}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Examples and VAD-based Community Apps

Core symbols most depended-on inside this repo

getStartOffset
called by 11
examples/java-wav-file-example/src/main/java/org/example/SileroSpeechSegment.java
getEndOffset
called by 8
examples/java-wav-file-example/src/main/java/org/example/SileroSpeechSegment.java
reset_states
called by 7
src/silero_vad/utils_vad.py
setEndOffset
called by 7
examples/java-wav-file-example/src/main/java/org/example/SileroSpeechSegment.java
calculateSecondByOffset
called by 6
examples/java-wav-file-example/src/main/java/org/example/SileroVadDetector.java
resetStates
called by 5
examples/java-example/src/main/java/org/example/SlieroVadOnnxModel.java
close
called by 5
examples/java-example/src/main/java/org/example/SlieroVadOnnxModel.java
resetStates
called by 5
examples/java-wav-file-example/src/main/java/org/example/SileroVadOnnxModel.java

Shape

Method 68
Function 24
Class 18

Languages

Python56%
Java43%
Go1%

Modules by API surface

tuning/utils.py22 symbols
src/silero_vad/utils_vad.py21 symbols
examples/microphone_and_webRTC_integration/microphone_and_webRTC_integration.py11 symbols
examples/java-wav-file-example/src/main/java/org/example/SileroVadOnnxModel.java10 symbols
examples/java-wav-file-example/src/main/java/org/example/SileroSpeechSegment.java10 symbols
examples/java-example/src/main/java/org/example/SlieroVadOnnxModel.java8 symbols
examples/java-wav-file-example/src/main/java/org/example/SileroVadDetector.java7 symbols
examples/java-example/src/main/java/org/example/SlieroVadDetector.java5 symbols
examples/java-example/src/main/java/org/example/App.java4 symbols
src/silero_vad/tinygrad_model.py3 symbols
examples/java-wav-file-example/src/main/java/org/example/App.java3 symbols
tests/test_basic.py2 symbols

Dependencies from manifests, versioned

github.com/go-audio/audiov1.0.0 · 1×
github.com/go-audio/riffv1.0.0 · 1×
github.com/go-audio/wavv1.1.0 · 1×
github.com/streamer45/silero-vad-gov0.2.1 · 1×
com.microsoft.onnxruntime:onnxruntime1.23.1 · 1×
junit:junit3.8.1 · 1×
packaging
torch1.12.0 · 1×
torchaudio0.12.0 · 1×

For agents

$ claude mcp add silero-vad \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact