streamsad is a streaming-oriented Speech Activity Detection (SAD) module that operates frame by frame, without requiring access to the full audio signal (unlike batch processing). Unlike simple energy-based Voice Activity Detection (VAD), it accurately detects human speech while ignoring music, background noise, and silence. Powered by an efficient ONNX model and a post-processing algorithm inspired by WebRTC (using ring buffer smoothing), it runs entirely on the CPU with minimal overhead, making it ideal for real-time voice interfaces, ASR frontends, and low-resource deployments.
This module has been tested and works correctly with Python 3.10 through 3.13.
The following third-party dependencies are required to use streamsad:
numpyonnxruntime
Here is an example of how to use the streamsad module:
import numpy as np
from streamsad import SAD
# Initialize the SAD model
sad = SAD()
# Create an example audio stream (e.g., 2 seconds of random audio at 16kHz)
audio_np_array = np.random.randn(32000).astype(np.float32)
# Detect speech segments
segments = sad(audio_np_array)
# Print the detected segments
print(segments)You can install streamsad using pip:
pip install streamsadAfter installing the module, you can run unit tests using pytest:
pytest -s tests/test_sad.py