Last updated: 2026-04-15
This document captures the deferred follow-up phases for participant naming beyond the current implementation (label preservation + per-record manual renaming).
- Improve speaker identity accuracy with guided user enrollment.
- Keep local-first privacy defaults for voice snippets and embeddings.
- Add an explicit, user-controlled path to train and refresh participant identification.
- Full cloud identity sync.
- Cross-device profile replication.
- Automatic background retraining without user action.
Add a guided flow for creating participant profiles:
- User selects
Add Participant. - User enters participant name and optional metadata (role/team).
- App prompts for multiple short voice snippets (for example, 3 to 5 clips, 5 to 10 seconds each).
- App validates snippet quality (minimum SNR, non-silent, minimum duration).
- App extracts embeddings per snippet and builds an aggregated profile embedding.
- App stores enrollment artifact locally with timestamp and quality stats.
Recommended UX safeguards:
- Clear prompt copy per snippet: "Read this sentence in your normal voice."
- Retry affordance when snippet quality is low.
- Consent text for local voice profile storage.
Use enrolled profiles to improve diarized speaker labeling:
- Input: diarization segment embeddings + participant enrollment embeddings.
- Candidate algorithm:
- Start with cosine similarity + thresholding.
- Add top-1 / top-2 margin checks to reduce false positives.
- Optional confidence calibration using held-out enrollment snippets.
- Output:
matched participantwhen confidence >= threshold.unknown speakerwhen below threshold.- Preserve stable fallback IDs for unknown speakers (
speaker-1,speaker-2, etc.).
Training and refresh approach:
- "Train" can initially mean deriving profile centroids from enrollment embeddings.
- Later, allow a small local classifier (for example, nearest centroid with adaptive thresholds).
- Retraining triggers:
- new snippets added,
- participant renamed/merged,
- user feedback corrections from transcript view.
Add a participant profile model (future schema version):
ParticipantProfileid: UUIDdisplayName: StringembeddingBlob: Data(or array-backed JSON)sampleCount: IntqualityScore: DoublecreatedAt: DateupdatedAt: Date
Add optional linkage for record-level assignments:
TranscriptionRecordParticipantAssignmentrecordIDspeakerIDparticipantProfileIDconfidence
Potential additions:
SpeakerEnrollmentService- capture prompts and snippets
- validate snippet quality
- build enrollment embeddings
SpeakerIdentityService- match segment embeddings to participant profiles
- expose confidence and fallback reasons
- Extend
SpeakerDiarizerintegration:- hydrate known speakers from profile store before diarization
- write back assignment confidence metadata
- Settings > Models/Features:
- add
Participantssection with profile management.
- add
- Detail view:
- keep manual rename (already available),
- add
Link to Participantaction when profile match is uncertain.
- Onboarding nudge:
- suggest enrollment after repeated multi-speaker transcripts.
Track locally (and optionally telemetry if enabled):
- match rate,
- unknown-speaker rate,
- user correction rate,
- confidence distribution,
- average enrollment sample quality.
Acceptance targets:
- reduced manual rename frequency over time,
- high precision for top-confidence matches,
- no regression in diarization fallback behavior.
Reference inspiration: ~/Projects/clones/omi.
Research checklist:
- Review OMI enrollment UX (prompt cadence, retries, quality checks).
- Review profile persistence and embedding aggregation strategy.
- Identify reusable ideas for confidence thresholds and mismatch handling.
- Capture notes in a short implementation brief before code changes.
Deliverable for spike:
release-notes/participant-enrollment-research.md(or equivalent internal doc)- findings,
- recommended thresholds,
- migration risk notes,
- proposed test matrix.
- Risk: false identity matches.
- Mitigation: conservative thresholds + unknown fallback.
- Risk: poor enrollment audio quality.
- Mitigation: mandatory quality gate and retry loop.
- Risk: schema churn.
- Mitigation: isolate profile store behind service protocol first.
- Research spike and thresholds proposal.
- Profile persistence model and migration.
- Guided enrollment UI and snippet quality checks.
- Classifier matching pipeline integration.
- Confidence UX, correction loops, and metrics.