A comprehensive music agent framework built on smolagents, integrating state-of-the-art music AI models for understanding, generation, and interaction.
- ChatMusician Integration: Natural language music analysis and understanding
- Music Theory Analysis: Automatic analysis of musical structures, harmony, and form
- Audio Understanding: Content analysis of audio files
- Symbolic Music Generation: ABC notation generation using NotaGen
- Audio Generation: High-quality audio synthesis using Stable Audio Open
- Conditional Generation: Generate music based on text prompts, styles, and constraints
- smolagents Integration: Intelligent agent system that decides which tools to use
- Multi-modal Support: Text, audio, and symbolic music inputs
- Tool Orchestration: Seamless integration between different music AI models
- Gradio Web Interface: User-friendly web interface for interaction
- Score Visualization Integration: Beautiful sheet music rendering in PNG, PDF, and MusicXML.
- Audio Playback: Integrated audio player for generated content
- Flexible Local Ready: Scalable deployment options based on system capacity.
- Remote Deployment: Partially remote deployment through HF Inference Clients.
This project is GPU-optimized and requires:
- Python 3.10 or later
- NVIDIA GPU with CUDA 12.1+ (recommended)
- At least 8GB VRAM for local models (recommended > 40GB)
uv is a fast, modern Python package manager that provides reliable dependency resolution and faster installs.
# Install uv (once)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or on Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"# Clone the repository
git clone https://github.com/manoskary/weavemuse.git
cd weavemuse
# Install with GPU support (CUDA 12.1)
uv sync --extra-index-url https://download.pytorch.org/whl/cu121
# For development with all extras:
uv sync --extra dev --extra gpu --extra remote --extra audio --extra music --extra-index-url https://download.pytorch.org/whl/cu121
# Lock dependencies for reproducible installs
uv lock# Activate the uv environment
source .venv/bin/activate
# Run WeaveMuse
weavemuse serve# Clone the repository
git clone https://github.com/manoskary/weavemuse.git
cd weavemuse
# Install with GPU support
pip install -e ".[gpu]" --extra-index-url https://download.pytorch.org/whl/cu121
# For development
pip install -e ".[dev,gpu,remote,audio,music]" --extra-index-url https://download.pytorch.org/whl/cu121# Create conda environment
conda create -n weavemuse python=3.10
conda activate weavemuse
# Install PyTorch with CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Install WeaveMuse
pip install -e .WeaveMuse uses models from HuggingFace Hub and supports remote inference. Set up your environment:
# Already included, but if needed separately:
pip install huggingface_hub# Login with your HuggingFace token
huggingface-cli login
# Or set environment variable
export HF_TOKEN="your_huggingface_token_here"For lightweight usage without local GPU requirements, you can use HuggingFace's Inference API:
from weavemuse.agents.models import InferenceClientModel
# Use remote inference instead of local models
model = InferenceClientModel(
model_id="m-a-p/ChatMusician",
token="your_hf_token" # Optional if already logged in
)# Copy example environment file
cp .env.example .envEdit .env with your configuration:
# HuggingFace Configuration
HF_TOKEN=your_huggingface_token
HF_CACHE_DIR=./models/cache
# Model configurations
CHATMUSICIAN_MODEL_ID=m-a-p/ChatMusician
NOTAGEN_MODEL_PATH=./models/notagen
STABLE_AUDIO_MODEL_ID=stabilityai/stable-audio-open-1.0
# GPU configuration
DEVICE=cuda
TORCH_DTYPE=float16
CUDA_VISIBLE_DEVICES=0
# Server configuration
HOST=0.0.0.0
PORT=7860
DEBUG=false
# Remote API Keys (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_keyWeaveMuse provides several optional dependency groups:
gpu: CUDA-optimized packages for GPU accelerationremote: Remote API dependencies (OpenAI, Anthropic, etc.)audio: Extended audio processing capabilitiesmusic: Advanced music analysis toolsdev: Development dependenciesall: All optional dependencies combined
# Install specific extras with uv
uv sync --extra gpu --extra remote --extra-index-url https://download.pytorch.org/whl/cu121
# Or with pip
pip install -e ".[gpu,remote]" --extra-index-url https://download.pytorch.org/whl/cu121After installation, verify everything is working:
# Test basic functionality
python -c "from weavemuse.tools import NotaGenTool; print('β
WeaveMuse imported successfully')"
# Run tests
pytest tests/ -v
# Check GPU availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Start the web interface
weavemuse guiWeaveMuse provides flexible command-line options to launch different interfaces:
# Launch web interface (default)
weavemuse gui
# Launch terminal interface
weavemuse terminal
# For backwards compatibility, this also works:
weavemuse
# Show version
weavemuse --version- User-friendly Gradio web interface
- File upload capabilities for audio analysis
- Interactive chat with music agents
- Visual display of generated scores and audio playback
- Accessible at
http://localhost:7860
- Command-line interaction for advanced users
- Fast startup with on-demand loading
- Direct text-based communication with agents
- Ideal for scripting and automation
When launching WeaveMuse, you'll be prompted to choose your model configuration:
π€ Choose your AI model:
1. Only Local Models (Requires more resources and loading time)
2. HuggingFace cloud-based agent (some local tools - faster startup)
3. All Remote (All models and Tools are remote - no resources needed)
Important: The backbone language model drives the intelligence of all WeaveMuse agents. When using smaller models due to computational constraints, expect the overall intelligence and reasoning capabilities of the system to be affected accordingly.
WeaveMuse operates as a multi-agent system with specialized agents for different music tasks:
- Purpose: Orchestrates all music-related tasks
- Capabilities: Task routing, file handling, workflow management
- Tools: Base smolagents tools + specialized music agents
- Intelligence: Driven by the backbone model (local or remote)
1. Symbolic Music Agent
- Tools: NotaGenTool
- Function: Generates symbolic music in ABC notation
- Output: PDF scores, MusicXML, MIDI files, MP3 audio
- Use Cases: Composition based on musical periods, composers, instrumentation
2. Audio Analysis Agent
- Tools: AudioFlamingoTool, AudioAnalysisTool (optional)
- Function: Advanced audio content analysis using NVIDIA Audio Flamingo
- Capabilities: Musical element identification, acoustic analysis, content description
- Input: Audio files (any format supported)
3. Audio Generation Agent
- Tools: StableAudioTool
- Function: High-quality audio synthesis from text descriptions
- Technology: Stable Audio Open model
- Output: 44.1kHz stereo audio files
4. Web Search Agent
- Tools: WebSearchTool
- Function: Music-related information retrieval
- Capabilities: Research, fact-checking, music knowledge expansion
ChatMusicianTool
- Natural language music analysis and understanding
- Music theory explanations and composition guidance
- Chord progression analysis and recommendations
NotaGenTool
- Symbolic music generation in ABC notation format
- Supports various musical styles and instrumentation
- Automatic conversion to multiple formats (PDF, MIDI, MusicXML, MP3)
StableAudioTool
- Text-to-audio generation using Stable Audio Open
- High-quality stereo audio synthesis
- Conditional generation based on prompts
AudioFlamingoTool
- Remote audio analysis via NVIDIA's Audio Flamingo model
- Advanced acoustic analysis and content understanding
- Zero-setup remote processing
AudioAnalysisTool (Optional)
- Local audio analysis using Qwen2-Audio model
- Requires local GPU resources
- Detailed musical content analysis
The backbone language model is the core intelligence driving all WeaveMuse agents. This model determines:
- Task Understanding: How well the system interprets your requests
- Tool Selection: Which specialized tools to use for specific tasks
- Workflow Orchestration: How effectively multiple tools are combined
- Response Quality: The coherence and helpfulness of outputs
- Local Models: Better reasoning but require more resources (8GB+ VRAM recommended)
- Remote Models: Good balance of intelligence and resource usage
- Smaller Models: Limited reasoning but faster and lower resource requirements
Configuration 1: Only Local Models
- VRAM: 16GB+ recommended, 8GB minimum
- Intelligence: Highest (full local model reasoning)
- Startup: Slower (model loading time)
- Privacy: Complete (no external API calls)
Configuration 2: HuggingFace Cloud Agent + Local Tools
- VRAM: 4-8GB for specialized tools
- Intelligence: High (cloud model reasoning)
- Startup: Medium (partial local loading)
- Privacy: Hybrid (reasoning remote, some tools local)
Configuration 3: All Remote
- VRAM: <1GB (minimal local processing)
- Intelligence: High (cloud model reasoning)
- Startup: Fastest (no model loading)
- Privacy: Limited (all processing remote)
weavemuse gui
# User: "Create a baroque-style piece for string quartet and analyze its harmonic structure"
# System: Uses NotaGenTool β ChatMusicianTool β Returns score + analysisweavemuse terminal
# User uploads audio file: "What instruments are in this recording? Generate something similar."
# System: Uses AudioFlamingoTool β StableAudioTool β Returns analysis + new audioweavemuse gui
# User: "Research Beethoven's late string quartets and compose something inspired by Op. 131"
# System: Uses WebSearchTool β ChatMusicianTool β NotaGenTool β Returns research + composition
#### Troubleshooting
#### Common Issues
**CUDA/GPU Issues:**
```bash
# Check GPU availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Install CUDA packages if needed
uv sync --extra gpu --extra-index-url https://download.pytorch.org/whl/cu121
HuggingFace Authentication:
# Login with HuggingFace token
huggingface-cli login
# Or set environment variable
export HF_TOKEN="your_token_here"Dependency Conflicts:
# Reset and reinstall
rm -rf .venv uv.lock
uv sync --extra-index-url https://download.pytorch.org/whl/cu121Audio Generation Issues:
# Install audio extras
uv sync --extra audio --extra-index-url https://download.pytorch.org/whl/cu121# Install development dependencies
uv sync --extra dev --extra-index-url https://download.pytorch.org/whl/cu121
# Run tests
pytest tests/ -v
# Format code
black weavemuse/ tests/
isort weavemuse/ tests/
# Type checking
mypy weavemuse/weavemuse/
βββ weavemuse/ # Main package
β βββ agents/ # Agent implementations and models
β βββ tools/ # Music tool implementations
β βββ interfaces/ # UI and CLI interfaces
β βββ utils/ # Utility functions and GPU detection
β βββ __init__.py # Package initialization
βββ tests/ # Test suite
βββ static/ # Static assets (logos, icons)
βββ models/ # Downloaded model files (auto-created)
βββ requirements.txt # Pip requirements
βββ pyproject.toml # Project configuration and dependencies
βββ README.md # This file
WeaveMuse is designed to be extensible. To add custom music tools:
from smolagents.tools import Tool
from weavemuse.tools.base_tools import ManagedTransformersTool
class CustomMusicTool(ManagedTransformersTool):
name = "custom_music"
description = "Your custom music tool description"
inputs = {"prompt": {"type": "string", "description": "Input prompt"}}
output_type = "string"
def _load_model(self):
# Implement your model loading logic
pass
def _call_model(self, model, **kwargs):
# Implement your tool logic
passFor programmatic access, use the agents directly:
from weavemuse.agents.agents_as_tools import get_weavemuse_agents_and_tools
from smolagents import InferenceClientModel
model = InferenceClientModel()
agents, tools = get_weavemuse_agents_and_tools(
model=model,
device_map="auto",
tool_mode="hybrid"
)
# Use specific agents
symbolic_agent = agents[0] # Symbolic music agent
result = symbolic_agent.run("Compose a waltz in 3/4 time")-
ChatMusician: Music understanding and analysis
- Model:
m-a-p/ChatMusician - Capabilities: Music theory, harmony analysis, composition guidance
- Model:
-
NotaGen: Symbolic music generation
- Model: Custom implementation with pre-trained weights
- Output: ABC notation format
-
Stable Audio Open: Audio generation
- Model:
stabilityai/stable-audio-open-1.0 - Output: High-quality 44.1kHz stereo audio
- Model:
-
Audio Analysis: Content understanding
- Multiple models for different analysis tasks
- Capabilities: Genre classification, mood detection, structure analysis
- Music Composition: Generate complete musical pieces
- Harmony Analysis: Analyze chord progressions and harmonic structure
- Style Transfer: Convert music between different styles
- Audio Synthesis: Convert symbolic music to audio
- Format Conversion: Between ABC, MIDI, MusicXML, and audio formats
- Music Education: Explain music theory concepts and analysis
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE file for details.
- smolagents - Agent framework
- ChatMusician - Music understanding
- NotaGen - Symbolic music generation
- Stable Audio - Audio generation
- Verovio - Music notation rendering
If you use this framework in your research, please cite:
@software{music_agent_framework,
title={Music Agent Framework: Comprehensive Music AI with smolagents},
author={Music Agent Team},
year={2025},
url={https://github.com/music-agent/music-agent}
}