Skip to content

mohamad-hasan-sohan-ajini/streamsad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

streamsad

streamsad is a streaming-oriented Speech Activity Detection (SAD) module that operates frame by frame, without requiring access to the full audio signal (unlike batch processing). Unlike simple energy-based Voice Activity Detection (VAD), it accurately detects human speech while ignoring music, background noise, and silence. Powered by an efficient ONNX model and a post-processing algorithm inspired by WebRTC (using ring buffer smoothing), it runs entirely on the CPU with minimal overhead, making it ideal for real-time voice interfaces, ASR frontends, and low-resource deployments.

Dependencies

This module has been tested and works correctly with Python 3.10 through 3.13.

The following third-party dependencies are required to use streamsad:

  • numpy
  • onnxruntime

How to Use

Here is an example of how to use the streamsad module:

import numpy as np
from streamsad import SAD

# Initialize the SAD model
sad = SAD()

# Create an example audio stream (e.g., 2 seconds of random audio at 16kHz)
audio_np_array = np.random.randn(32000).astype(np.float32)

# Detect speech segments
segments = sad(audio_np_array)

# Print the detected segments
print(segments)

Installation

You can install streamsad using pip:

pip install streamsad

Testing

After installing the module, you can run unit tests using pytest:

pytest -s tests/test_sad.py

About

A light-weight, streaming speech activity detection (SAD) module, designed for real-time speech-only classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages