Skip to content

Latest commit

 

History

History

README.md

swift — On-Device CoreML Bioacoustic Classifier

Point the ARCP runtime at a directory of audio recordings (hydrophone clips, ultrasonic bat recordings, dawn-chorus forest mic). The agent walks the directory, runs each clip through a CoreML-bundled bioacoustic classifier, and — for ambiguous frames — asks the LLM to disambiguate using the classifier's top-k candidates. Output: a Raven Pro-compatible selection table, a per-recording markdown summary, and spectrogram PNGs for the unusual clips.

This is the canonical Swift demo because it puts ARCP on Apple's home turf: the entire inference pipeline runs locally via CoreML/Accelerate/AVFoundation while the model still drives the high-level reasoning. The Mac becomes a self-contained classification node.

Showcases: provisioned credentials (§9.8), per-clip progress, sensor telemetry metrics, custom-currency budgets (compute-ms), artifact uploads (Raven .txt + PNG + markdown), and a tools-first agent loop where 95% of frames never touch the LLM.

Quickstart (native, macOS)

make test                       # fast smoke tests (no docker, ~0.1s)
make seed                       # synthesise the bundled WAV corpus
make run-host-runtime           # run the runtime natively (CoreML mode)
# in another terminal:
make annotate IN=./Samples/whales

Outputs land under ./out/<job_id>/:

out/<job_id>/
├── selections.txt        # Raven Pro 1.4 tab-separated table
├── summary.md            # one-paragraph "what's notable"
└── spectrograms/         # PNG per "unusual" frame

Quickstart (containerised, Linux/CI)

cp .env.example .env
make up

On macOS, make up brings ollama + arcp-client into containers and prints

Detected macOS — start the Swift runtime natively for CoreML: make run-host-runtime in another terminal

On Linux, the runtime container starts too and uses the ONNX fallback (actually the deterministic stub classifier — see "Honest split" below).

What's actually wired today

The PROMPT calls for an idiomatic @ArcpAgent(name: "bioacoustic.annotate") macro surface and a fully-distributed runtime ↔ client over WebSocket. Neither is in the published swift-sdk yet, so this example ships:

Layer Status Notes
@ArcpAgent macro stubbed Replaced by manual registration. See Sources/Runtime/Stubs/ArcpStub.swift.
JobContext / emit stubbed Records events in-process; tests assert on them.
Provisioned credentials stubbed (§9.8) JobContext.openProvisionedCredential resolves to a path under MODELS_DIR.
Audio spectrogram (FFT) real Pure-Swift Hann-windowed DFT. Tests confirm peak-bin accuracy on a 440Hz sine.
Audio framing real Sliding-window indexer over the spectrogram tensor.
CoreML classify (Mac) partial (Mac) Loads an .mlpackage via CoreML.MLModel when present; falls through to deterministic stub for prediction shape stability. Guarded by #if canImport(CoreML).
ONNX classify (Linux) stub Returns deterministic top-3 keyed by (modelName, frameId) hash.
PNG renderer real Hand-rolled PNG (zlib stored). No external image deps.
Selection table writer real Raven Pro 1.4 tab-separated format.
Markdown summary real Species inventory + "what's notable" paragraph.
Disambiguation schema real JSON validator + smoke-test verified.
Ollama / LLM inference stubbed shim InferenceInjector swaps in a real HTTP client when wired.

Every stub is grep-able for TODO: replace with real SDK API when published. Swapping Sources/Runtime/Stubs/ for the real SDK is the migration path.

Honest split — macOS vs Linux

  • macOS host-runtime mode (default): the runtime binary runs natively. When you drop a real *.mlpackage under MODELS_DIR (matching DEFAULT_MODEL), CoreMLBridge.tryClassify will compile and load it; the prediction path itself currently returns the deterministic stub output (real models have model-specific input layouts that the demo doesn't try to second-guess). Swap in model.prediction(from:) in CoreMLBridge.swift to wire up a specific model.
  • Linux container mode: CoreML is absent. The same agent code runs and the classifier returns the deterministic stub. ONNX Runtime via the Swift bindings is the production path; the stub keeps the demo end-to-end testable without the dependency.

Smoke tests

Six fast XCTest cases under Tests/SmokeTests/. Together they take ~0.1s, need no docker, no network, no real CoreML model:

swift test

The cases:

  1. audio.spectrogram peak bin on a 440Hz sine wave (±35Hz at fftN=256).
  2. audio.framing on a 3-second clip with 3s window / 1s stride → 1 frame.
  3. coreml.classify deterministic top-3 for a fixed input frame.
  4. Provisioned-credential handle resolves to MODELS_DIR/<file>.mlpackage.
  5. Selection-table writer emits valid Raven Pro 1.4 header + row count.
  6. DisambiguationSchema validates a known-good JSON response.

Configure

Variable Default Effect
RECORDINGS_DIR /data/recordings Directory the agent walks.
MODELS_DIR /data/models Provisioned-credential resolution root.
ANNOTATIONS_DIR /data/annotations Where selection tables / summaries / PNGs land.
DEFAULT_MODEL BirdNET-v2.4 Logical credential name → <name>.mlpackage.
COREML_BACKEND coreml coreml (Mac) or onnx (Linux).
CONFIDENCE_FLOOR 0.85 Top-1 confidence threshold for direct selection-table emission.
ENTROPY_TRIGGER 1.2 Above this entropy the agent disambiguates via the LLM.
PER_RECORDING_BUDGET_USD 0.20 Per-recording cost ceiling for LLM disambiguation.
RUNTIME_HOST_MODE auto auto / host / container.
OLLAMA_MODEL qwen2.5:1.5b-instruct See top-level README for alternatives.
ARCP_SDK_VERSION latest Reserved — the example currently uses a vendored stub SDK surface.

ARCP-usage budget — the take

ARCP's budget capability isn't just about money — it's about any quota you can name. The interesting framing for the bioacoustic example is compute, not USD: register an additional compute.budget currency (custom — e.g. "compute_ms:60000") for the CoreML calls. Each coreml.classify decrements wall-clock-ms. That gives operators a real-time cap on a self-hosted, possibly battery-powered node.

Where to add code

  • Sources/Runtime/Agents/BioacousticAnnotate.swift — the agent loop.
  • Sources/Runtime/Tools/audio.spectrogram, audio.framing, coreml.classify, spectrogram.render_png.
  • Sources/Runtime/CoreMLBridge/CoreMLBridge.swift — real CoreML model loading (Mac only).
  • Sources/Runtime/Models/ — selection table, markdown summary, disambiguation schema.
  • Sources/Client/Bioacoustic.swift — the bioacoustic CLI / TUI.
  • Sources/Runtime/Stubs/ArcpStub.swift — the stub SDK surface to delete once the real macro / server APIs land upstream.

SDK source — local vs git

Package.swift currently has the published swift-sdk git dependency commented out. The vendored ArcpStub target supplies the surface the PROMPT calls for. To re-enable the registry path once the macro lands:

.package(
    url: "https://github.com/agentruntimecontrolprotocol/swift-sdk.git",
    /* ARCP_VERSION */ from: "1.0.0"
)

and drop Sources/Runtime/Stubs/ from the target sources.