SPECTRA - Speech Classification & Transcription Analysis

SPECTRA is a speech analysis tool that can determine whether speech in an audio file is read from a prepared script or spoken spontaneously. It analyzes various speech features including word length, pauses, and speech rate to make this determination.

Features

Analyze audio files to classify speech as read or spontaneous
Calculate reading probability as a percentage
Generate human-readable explanations of analysis results
Support for multiple audio formats (WAV, MP3, M4A, WebM)
RESTful API for integration with other applications

Architecture

The project follows a clean architecture pattern:

src/core/ - Core application logic and DTOs
src/use_cases/ - Business logic implementation
tests/ - Unit tests

How It Works

SPECTRA uses a novel approach to classify read and spontaneous speech based on three key features:

Active average word length: The average length of words in the speech
Inactive alphabets per second: The frequency of pauses
Words per second: The overall speech rate

These features are combined to compute a readability score that indicates how likely the speech is to be read from a script.

Dependencies

Python 3.8+
FastAPI
Parselmouth
Faster-Whisper
NumPy
LangChain
Ollama (for explanation generation)
FFmpeg (for audio conversion)

Installation

Clone the repository:

git clone https://github.com/mechamogeo/spectra.git
cd spectra

Install dependencies:

pip install fastapi uvicorn parselmouth faster-whisper numpy langchain-ollama

Install FFmpeg for audio conversion:
- On Ubuntu: sudo apt-get install ffmpeg
- On macOS: brew install ffmpeg
- On Windows: Download from ffmpeg.org
Install Ollama for LLM explanations: Follow the instructions at ollama.ai
Pull the required model:
```
ollama pull granite3.2:2b
```

Running the API

Start the API server:
```
python -m src.core.app
```
The API will be available at http://localhost:8000
Access the API documentation at http://localhost:8000/docs

API Usage

Analyze an audio file

GET /analyze?file_name=your_audio_file.wav

Make sure your audio file is placed in the resources/audios/ directory.

Running Tests

To run the test suite:

pytest tests/

References

@misc{kopparapu2023novelschemeclassifyread,
      title={A Novel Scheme to classify Read and Spontaneous Speech},
      author={Sunil Kumar Kopparapu},
      year={2023},
      eprint={2306.08012},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2306.08012},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
resources/audios		resources/audios
src		src
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPECTRA - Speech Classification & Transcription Analysis

Features

Architecture

How It Works

Dependencies

Installation

Running the API

API Usage

Analyze an audio file

Running Tests

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPECTRA - Speech Classification & Transcription Analysis

Features

Architecture

How It Works

Dependencies

Installation

Running the API

API Usage

Analyze an audio file

Running Tests

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages