Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ graph TB
6. **Sample upload** lets users upload WAV files (songs, snippets) as reference tracks. The system analyzes metadata, generates CLAP embeddings, and the agent finds similar library samples via audio-to-audio cosine similarity
7. **Pair feedback** lets users evaluate sample pairs — the agent presents plausible pairs with side-by-side playback and a "Play Together" mixed preview (aligned to song context key/BPM), collects thumbs up/down verdicts, and computes relational audio features in the background. Rapid pairing mode with random anchors and a "Next Pair" button enables fast verdict collection.
8. **Preference learning** trains a logistic regression on 10-dimensional feature vectors (4 pair scores + 6 relational audio features) from accumulated verdicts. Auto-retrains every 5th verdict after 15 verdicts. Learned preferences are injected into the agent's system prompt and surfaced via the `show_preferences` tool as natural-language explanations.
9. **Kit builder** assembles a complete multi-sample kit (e.g., kick + snare + hihat + bass + pad) using a greedy algorithm that maximizes pairwise compatibility. Per-type CLAP search retrieves candidates, fast inline scoring selects samples, and CNN diversity penalties prevent spectral redundancy — all rendered as an interactive kit card with per-slot playback
9. **Kit builder** assembles a complete multi-sample kit (e.g., kick + snare + hihat + bass + pad) using a greedy algorithm that maximizes pairwise compatibility. Per-type CLAP search retrieves candidates, fast inline scoring selects samples, and CNN diversity penalties prevent spectral redundancy. Key compatibility scoring is skipped for unpitched/percussive types (drums, percussion, fx, etc.) — all rendered as an interactive kit card with per-slot playback
10. **Agent streams response** back as SSE in Vercel AI SDK format, with transparent tool-call display and a song context badge in the chat header

### Why CLAP + CNN + Agent?
Expand Down
2 changes: 1 addition & 1 deletion backend/src/samplespace/agents/sample_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
- **Pair feedback**: present_pair → user verdict → record_verdict. The system learns from verdicts over time — after enough feedback, use show_preferences to explain what it has learned.
- **Rapid pairing**: When the user asks to "start a pairing session" or "evaluate pairs," use present_pair with anchor_type and candidate_type (omit sample_id for random anchors). When you receive a `[NEXT_PAIR]` message, call record_verdict for the previous pair, then immediately call present_pair again with the same types — keep it fast, minimal commentary.
- **Upload flow**: User uploads a WAV → analyze_sample → find_similar_to_upload to find library matches.
- If the user references a sample by name rather than ID, search for it first.
- **Resolving sample references**: Users will refer to samples by ordinal position ("the 3rd one", "the first result"), by filename ("warm-pad.wav"), or by description ("that bass loop"). When they use an ordinal, resolve it from the most recent search or tool results in the conversation — each result includes a 1-based `index` field. When they use a filename or partial name, search for it first.

## Output Rules

Expand Down
12 changes: 9 additions & 3 deletions backend/src/samplespace/agents/tools/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ def format_sample_results(
) -> str:
"""Format a list of sample results as a playable sample-results code fence."""
samples: list[dict[str, object]] = []
for s in results:
payload = sample_to_payload(s)
for i, s in enumerate(results, start=1):
payload = sample_to_payload(s, index=i)
if annotations and s.id in annotations:
payload["annotation"] = annotations[s.id]
samples.append(payload)
Expand All @@ -21,13 +21,19 @@ def format_sample_results(
return f"{header}\n\n```sample-results\n{json_str}\n```"


def sample_to_payload(sample: SampleSchema, audio_url: str | None = None) -> dict[str, object]:
def sample_to_payload(
sample: SampleSchema,
audio_url: str | None = None,
index: int | None = None,
) -> dict[str, object]:
"""Build a JSON-serializable payload dict for a sample."""
payload: dict[str, object] = {
"id": sample.id,
"filename": sample.filename,
"audio_url": audio_url or f"/api/samples/{sample.id}/audio",
}
if index is not None:
payload["index"] = index
if sample.sample_type:
payload["type"] = sample.sample_type
if sample.is_loop:
Expand Down
7 changes: 7 additions & 0 deletions backend/src/samplespace/agents/tools/transform_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

from samplespace.agents.deps import AgentDeps
from samplespace.schemas.sample import SampleSchema
from samplespace.schemas.sample_type import UNPITCHED_TYPES
from samplespace.services import audio_transform as audio_transform_service
from samplespace.services import music_theory as music_theory_service
from samplespace.services import sample as sample_service
Expand Down Expand Up @@ -60,6 +61,12 @@ async def transform_single_sample(
will_stretch = target_bpm is not None and sample.bpm is not None
skipped: list[str] = []

# Percussive/noise types: skip pitch-shifting (degrades quality), keep BPM stretching
if sample.sample_type and sample.sample_type.lower() in UNPITCHED_TYPES:
if will_pitch:
skipped.append("Pitch-shift skipped — percussive sample type.")
will_pitch = False

if target_key and not sample.key:
skipped.append("Key transformation skipped — sample has no detected key.")
if target_bpm and not sample.bpm:
Expand Down
14 changes: 14 additions & 0 deletions backend/src/samplespace/schemas/sample_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,20 @@ class SampleType(StrEnum):

SAMPLE_TYPES: list[str] = sorted(t.value for t in SampleType)

# Sample types where pitch-shifting is harmful or meaningless.
# Percussive/noise-based — even as loops, shifting their pitch
# degrades quality without musical benefit. BPM time-stretching still applies.
UNPITCHED_TYPES: set[str] = {
SampleType.KICK,
SampleType.SNARE,
SampleType.HIHAT,
SampleType.CLAP,
SampleType.CYMBAL,
SampleType.PERCUSSION,
SampleType.DRUM,
SampleType.FX,
}

# Keyword-to-type mapping for inferring sample type from file paths.
# Keys are SampleType enum members; values are directory/segment keywords
# that map to that type (checked against lowercased path segments).
Expand Down
16 changes: 3 additions & 13 deletions backend/src/samplespace/services/candidate_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,10 @@
"""

from samplespace.schemas.sample import SampleSchema
from samplespace.schemas.sample_type import SampleType
from samplespace.schemas.sample_type import UNPITCHED_TYPES
from samplespace.schemas.thread import SongContext
from samplespace.services.music_theory import normalize_bpm, semitone_key_score

# Types that are typically one-shots (no meaningful key)
ONE_SHOT_TYPES: set[str] = {
SampleType.KICK,
SampleType.SNARE,
SampleType.HIHAT,
SampleType.CLAP,
SampleType.PERCUSSION,
SampleType.FX,
}

# Re-ranking weight profiles: (clap, bpm, key)
_TONAL_WEIGHTS = (0.4, 0.25, 0.35)
_PERCUSSIVE_WEIGHTS = (0.5, 0.5, 0.0)
Expand All @@ -43,7 +33,7 @@ def build_clap_query(
parts.append(f"{sample_type} sample")

if song_context:
if sample_type.lower() not in ONE_SHOT_TYPES and song_context.key:
if sample_type.lower() not in UNPITCHED_TYPES and song_context.key:
parts.append(song_context.key)
if song_context.bpm:
parts.append(f"{song_context.bpm} BPM")
Expand All @@ -67,7 +57,7 @@ def rerank_candidates(

has_bpm = song_context.bpm is not None
has_key = song_context.key is not None
is_tonal = sample_type.lower() not in ONE_SHOT_TYPES
is_tonal = sample_type.lower() not in UNPITCHED_TYPES

w_clap, w_bpm, w_key = _TONAL_WEIGHTS if is_tonal else _PERCUSSIVE_WEIGHTS

Expand Down
8 changes: 5 additions & 3 deletions backend/src/samplespace/services/kit_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from samplespace.models.sample import Sample
from samplespace.schemas.kit import KitResult, KitSlot, PairwiseEntry
from samplespace.schemas.sample import SampleSchema
from samplespace.schemas.sample_type import SAMPLE_TYPES, SampleType
from samplespace.schemas.sample_type import SAMPLE_TYPES, UNPITCHED_TYPES, SampleType
from samplespace.schemas.thread import SongContext
from samplespace.services import embedding as embedding_service
from samplespace.services import music_theory as music_theory_service
Expand Down Expand Up @@ -267,8 +267,10 @@ def _fast_compatibility(sample_a: SampleSchema, sample_b: SampleSchema) -> float
pair_key = frozenset({sample_a.sample_type.lower(), sample_b.sample_type.lower()})
scores.append(TYPE_COMPLEMENTARITY.get(pair_key, DEFAULT_TYPE_SCORE))

# Key compatibility (only for loops with keys)
if sample_a.is_loop and sample_b.is_loop and sample_a.key and sample_b.key:
# Key compatibility (only for pitched loops with keys)
a_pitched = sample_a.sample_type and sample_a.sample_type.lower() not in UNPITCHED_TYPES
b_pitched = sample_b.sample_type and sample_b.sample_type.lower() not in UNPITCHED_TYPES
if sample_a.is_loop and sample_b.is_loop and sample_a.key and sample_b.key and a_pitched and b_pitched:
value, _ = music_theory_service.key_compatibility_score(sample_a.key, sample_b.key)
scores.append(value)

Expand Down
6 changes: 4 additions & 2 deletions backend/src/samplespace/services/pair_scoring.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from samplespace.models.sample import Sample
from samplespace.schemas.pair import DimensionScore, PairScore
from samplespace.schemas.sample_type import SampleType
from samplespace.schemas.sample_type import UNPITCHED_TYPES, SampleType
from samplespace.services import music_theory as music_theory_service
from samplespace.services.music_theory import normalize_bpm

Expand Down Expand Up @@ -67,7 +67,9 @@ async def score_pair(db: AsyncSession, sample_a_id: str, sample_b_id: str) -> Pa

dimensions: dict[str, DimensionScore] = {}

if sample_a.is_loop and sample_b.is_loop and sample_a.key and sample_b.key:
a_pitched = sample_a.sample_type and sample_a.sample_type.lower() not in UNPITCHED_TYPES
b_pitched = sample_b.sample_type and sample_b.sample_type.lower() not in UNPITCHED_TYPES
if sample_a.is_loop and sample_b.is_loop and sample_a.key and sample_b.key and a_pitched and b_pitched:
dimensions["key"] = _compute_key_score(sample_a.key, sample_b.key)

if sample_a.is_loop and sample_b.is_loop and sample_a.bpm and sample_b.bpm:
Expand Down
Loading
Loading