AI-powered music composition suite — generate audio clips with text prompts and arrange them into full songs using a built-in multi-track DAW. Everything runs locally on your own computer.
🤖 Built with Claude — This project is an experiment in AI-accelerated development. The entire multi-track composer, Canvas DAW engine, backend APIs, and frontend were designed and implemented in collaboration with Claude Code (Anthropic's coding agent). The idea: explore what it looks like to build a local-first, AI-powered music creation tool at full speed — from concept to working software — using AI models that run on your own hardware, no cloud APIs needed.
Based on RC Stable Audio Tools (fork of Stability AI's Stable Audio Tools) with a custom multi-track composer for arranging AI-generated clips into compositions.
git clone https://github.com/miikkij/Musica1.git
cd Musica1| Platform | Command |
|---|---|
| Windows | Double-click setup.bat |
| macOS | chmod +x setup.sh && ./setup.sh |
| Linux | chmod +x setup.sh && ./setup.sh |
The setup script checks your system and installs everything automatically:
- Python 3.10+ — the programming language everything runs on
- uv — a fast package manager that sets up the Python environment
- Node.js — needed to build the Composer's browser interface
- GPU/CUDA — detects your graphics card for fast audio generation (works without one too, just slower)
- Python packages — all the AI/audio libraries (PyTorch, librosa, etc.)
- Composer frontend — builds the DAW browser app
If something is missing, the script tells you exactly what to install and where to get it.
Windows: Double-click start.bat
macOS / Linux:
# Terminal 1 — AI Audio Generator
uv run python run_gradio.py
# Terminal 2 — Multi-Track Composer
uv run python -m composer.server.appThen open in your browser:
- http://localhost:7860 — Audio Generator (create clips from text prompts)
- http://localhost:8000 — Multi-Track Composer (arrange clips into songs)
First time? The generator will show a model downloader. Download RoyalCities/Foundation-1 (recommended) and restart the app. After that, you can generate audio from text descriptions like "Rhodes Piano, Warm, Rich, Chord Progression, Medium Reverb".
- Multi-Track Composer — browser-based DAW with Canvas timeline, drag-and-drop, loop-extend, zoom, minimap
- Advanced Generation Options — full sampler control (CFG, sigma, steps, seed, negative prompt) from within the composer
- Prompt Guide — built-in help dialog with all instrument/timbre/FX tags and examples
- Random Prompt Generator — one-click tag-based prompt generation
- Keyboard Shortcuts — Space=play/stop, 1/2/3=mode switch, +/-=zoom, H=help
- Auto-Save — project state persists in localStorage automatically
- BPM Snap — clips snap to beat grid when moving
This repo inherits all features from RC Stable Audio Tools:
- Dynamic Model Loading: Enables dynamic model swaps of the base model and any future community finetune releases.
- Random Prompt Button: A one-click Random Prompt button tied directly onto the loaded models metadata.
- BPM & Bar Selector: BPM & Bar settings tied to the model's timing conditioning, which will auto-fill any prompt with the needed BPM/Bar info. You can also lock or unlock the BPM if you wish to randomize this as well with the Random Prompt button.
- Key Signature Locking: Key signature is now tied to UI and can be locked or unlocked with the random prompt button.
- Automatic Sample to MIDI Converter: The fork will automatically convert all generated samples to .MID format, enabling users to have an infinite source of MIDI.
- Automatic Sample Trimming: The fork will automatically trim all generated samples to the exact length desired for easier importing into DAWs.
git clone https://github.com/miikkij/Musica1.git
cd Musica1Use Python 3.10. Newer versions (e.g. 3.11+) can fail dependency resolution due to pinned packages (notably older SciPy wheels).
It's recommended to use a virtual environment to manage dependencies:
-
Windows:
python -m venv venv venv\Scripts\activate
-
macOS and Linux:
python3 -m venv venv source venv/bin/activate
Install Stable Audio Tools and the necessary packages from setup.py:
pip install stable-audio-tools
pip install .To ensure Gradio uses GPU/CUDA and not default to CPU, uninstall and reinstall torch, torchvision, and torchaudio with the correct CUDA version:
pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121This fork supports optional INT4 weight-only inference via TorchAO.
It can reduce VRAM usage further, but it can be very slow on Windows because Triton fast-kernels are usually unavailable (falls back to slower paths).
To enable the INT4 toggle in the UI:
Windows (recommended, pinned):
pip install torchao==0.12.0Linux:
pip install torchaoIf TorchAO isn’t installed or compatible with your environment, the INT4 toggle will remain hidden/disabled.
A sample config.json is included in the root directory. Customize it to specify directories for custom models and outputs (.wav and .mid files will be stored here):
{
"model_directory": "models",
"output_directory": "generations"
}Start the Gradio interface using a batch file or directly from the command line:
@echo off
cd /d path-to-your-venv/Scripts
call activate
cd /d path-to-your-stable-audio-tools
python run_gradio.py --model-config models/path-to-config/example_config.json --ckpt-path models/path-to-config/example.ckpt
pauseYou can launch the web UI by simply calling:
python run_gradio.pyThis will start the gradio UI. If you're running for the first time, it will launch a model downloader interface, where you can initialize the app by downloading your first model. After downloading, you will need to restart the app to get the full UI.
When you run the app AFTER downloading a model, the full UI will launch.
You can also launch the app with custom flags:
python run_gradio.py --model-config models/path-to-config/example_config.json --ckpt-path models/path-to-config/example.ckptInput prompts in the Gradio interface to generate audio and MIDI files, which will be saved as specified in config.json.
The interface has been expanded with Bar/BPM settings (which modifies both the user prompt + sample length conditioning), MIDI display + conversion and also features Dynamic Model Loading.
Models must be stored inside their own sub folder along with their accompanying config files. i.e. A single finetune could have multiple checkpoints. All related checkpoints could go inside of the same "model1" subfolder but its important their associated config file is included within the same folder as the checkpoint itself.
To switch models simply pick the model you want to load using the drop down and pick "Load Model".
When you launch with python run_gradio.py, it will:
- First check if the
modelsfolder has any model downloaded. - If there is a model, it will launch the full UI with that model loaded.
- If the models folder is empty, it will launch a HFFS (HuggingFace downloader) UI, where you can either select from the preset models, or enter any HuggingFace repo id to download. (After downloading a model, you will need to restart the app to launch the full UI).
- To customize the preset models that appear in the downloader dropdown, edit the
config.jsonfile to add more entries to thehffs[0].optionsarray.
A built-in browser-based DAW for arranging AI-generated audio clips into full compositions.
Double-click start.bat to launch both the Gradio generator and the Composer. Or run them separately:
# Terminal 1: Audio generator
uv run python run_gradio.py
# Terminal 2: Composer
uv run python -m composer.server.appThen open http://localhost:8000 in your browser.
The composer frontend needs to be built once:
cd composer
npm install
npm run buildDependencies (FastAPI, uvicorn) should already be installed. If not:
uv pip install fastapi uvicorn- Generate clips in the Gradio UI (port 7860) or directly in the Composer's sidebar
- Drag clips from the clip library onto the timeline tracks
- Arrange — move clips with mouse, loop-extend by dragging right edge
- Multiple clips per track — drop onto existing tracks to build arrangements
- Play/Stop with transport controls or Space bar — all tracks play in sync
- Right-click clips for context menu (duplicate, loop x2/x4, delete)
- Zoom with +/- keys or Ctrl+scroll, navigate with the minimap
- Export the mix as a single WAV file
- Save/Load projects — also auto-saves to localStorage
- Canvas Timeline Engine — custom-built DAW timeline with multi-clip tracks, waveform rendering, bar/beat grid
- Clip Looping — drag right edge to loop-extend, or right-click for loop x2/x4/fill
- Minimap — overview strip showing all clips, draggable viewport for navigation
- Three Modes — Cursor (seek), Move (drag clips), Select (regions) — switch with 1/2/3 keys
- BPM Snap — clips snap to beat boundaries when moving (toggle with toolbar)
- Song Length — auto-extends or set a target duration in mm:ss
- Advanced Generation — full sampler options modal (seed, steps, CFG, sampler, sigma, negative prompt)
- Prompt Guide — built-in help with all instrument/timbre/FX/behavior tags
- Random Prompt — generates tag-based prompts from the Foundation-1 vocabulary
- Clip Library — all generated WAVs with duration display, drag-and-drop
- BPM Detection — librosa-based detection for clips
- Time Stretching — stretch clips to match project BPM
- Project Persistence — save/load as JSON + auto-save to localStorage
- Mix Export — mix all tracks with volume, mute/solo, peak normalization
- Send to Composer — button in Gradio UI sends clips to the composer
- Keyboard Shortcuts — Space, 1/2/3, +/-, H (help), Delete, and more
- Dark Theme — consistent with the Gradio UI
| Key | Action |
|---|---|
| Space | Play / Stop |
| 1 | Cursor mode |
| 2 | Move mode |
| 3 | Select mode |
| + / = | Zoom in |
| - | Zoom out |
| Delete | Remove selected clip |
| H | Help dialog |
Browser
├── Gradio UI (port 7860) ── "Send to Composer" ──┐
└── Composer App (port 8000) │
├── Canvas Timeline Engine (custom JS) │
└── FastAPI Backend ◄──────────────────────────┘
├── /api/generate (proxies to Gradio)
├── /api/clips (list/serve WAVs)
├── /api/bpm (BPM detection)
├── /api/project (save/load)
├── /api/export (mix to WAV)
├── /api/stretch (time-stretch)
└── /api/loop (repeat clips)
This repo tracks RoyalCities/RC-stable-audio-tools as upstream:
git pull upstream main # fetch latest changes from RC Stable Audio ToolsFor detailed instructions on training and inference commands, flags, and additional options, refer to the main GitHub documentation: Stable Audio Tools Detailed Usage
I did my best to make sure the code is OS agnostic but I've only been able to test this with Windows / NVIDIA. Hopefully it works for other operating systems. The project now fully supports macOS and Apple Silicon (M1 and above). Special thanks to @cocktailpeanut for their help!
If theres any other features or tooling that you may want let me know on here or by contacting me on Twitter. I'm just a hobbyist but if it can be done I'll see what I can do.
Have fun!






