Skip to content

[C++] Core SDK implementation and build (split 3/6 from #415)#420

Open
zlata-stefanovic-db wants to merge 11 commits into
mainfrom
split/415/core
Open

[C++] Core SDK implementation and build (split 3/6 from #415)#420
zlata-stefanovic-db wants to merge 11 commits into
mainfrom
split/415/core

Conversation

@zlata-stefanovic-db

@zlata-stefanovic-db zlata-stefanovic-db commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

The core of the C++ SDK: a C++17, header + static-library wrapper over the
Zerobus C FFI (rust/ffi), which in turn wraps the Rust core. It gives C++
callers the same gRPC streaming / OAuth / recovery / ingestion engine as the
other SDKs, behind an idiomatic, exception-based, RAII API. This PR contains the
public API, its implementation, the build, and CI - but not the tests or
examples (those are stacked on top; see Merge order). Arrow Flight ingestion is
not part of this PR; it is peeled into its own PR in the split.

What's in this PR

Public API -cpp/include/zerobus/ (the surface; opaque, forward-declared
FFI handles only, so zerobus.h never leaks into consumers):

  • Sdk / SdkBuilder + TableProperties - connection factory and stream creation.
  • Stream - proto/JSON ingestion: single, batched, and fire-and-forget (*_nowait),
    plus flush, wait_for_offset, get_unacked_records, close.
  • ProtoSchema - build a descriptor + encode records straight from Unity Catalog
    table metadata (no .proto/protoc).
  • HeadersProvider - custom auth headers.
  • ZerobusException (is_retryable()), StreamOptions, UnackedRecord,
    version(), and the zerobus.hpp umbrella header.

Implementation -cpp/src/ (the only place that includes zerobus.h):

  • sdk.cpp, stream.cpp, proto_schema.cpp forward to the C FFI;
    headers_callback.cpp is the extern "C" trampoline.
  • src/detail/ internals: ResultGuard (CResult - ZerobusException, always
    freeing the C error string), the StreamOptions - C config conversion, and the
    trampoline declaration.

Build & CI:

  • CMakeLists.txt - library target (zerobus::zerobus), the install/
    find_package(zerobus) export with the FFI archive bundled, and the
    ZEROBUS_SANITIZE option (off by default).
  • cmake/BuildRustFfi.cmake builds libzerobus_ffi from local Rust source by
    default, or links a prebuilt lib via -DZEROBUS_FFI_LIBRARY=.
  • Makefile (build/test/lint/fmt, SANITIZE= pass-through), .clang-format.
  • ci-cpp.yml (fmt + test + Addresspush.yml`'s path filter.

Design notes

  • Memory ownership: Rust owns every handle; wrapper classes are move-only and
    free their handle exactly once (the source is nulled on move). Errors are thrown,
    never returned.
  • No gRPC/Protobuf C++ dependency: all marshalling crosses the C FFI as byte
    buffers / pointer arrays; the batch helpers build only the small pointer/length
    arrays the C entry points need.
  • Distribution (separate PRs): CMake + GitHub Releases, no package manager.

Merge order

Off main. Tests and examples are stacked on this PR and merge after it. The
add_subdirectory(tests)/(examples) wiring is intentionally omitted here and
arrives with those PRs, so this branch configures and builds the library cleanly
on its own.

Part of the #415 split.

Test plan

  • make build - configures and builds the library (FFI from source).
  • make lint - clang-format check + -Wall -Wextra.
  • Verified a separate project can find_package(zerobus) and link
    zerobus::zerobus from a cmake --install tree.
  • Unit tests and the AddressSanitizer run land with the stacked tests PR.

@zlata-stefanovic-db zlata-stefanovic-db self-assigned this Jun 24, 2026
@zlata-stefanovic-db zlata-stefanovic-db marked this pull request as ready for review June 24, 2026 13:47
@zlata-stefanovic-db zlata-stefanovic-db added the feature-request Net-new capability requested by customers label Jun 24, 2026
@zlata-stefanovic-db zlata-stefanovic-db changed the title [C++] Core SDK implementation and build (split 4/6 from #415) [C++] Core SDK implementation and build (split 3/6 from #415) Jun 24, 2026
@zlata-stefanovic-db zlata-stefanovic-db requested a review from a team June 24, 2026 15:55
## Summary

Core C++ SDK: the public headers (`include/`), implementation (`src/`), the
Rust C FFI build glue (`cmake/`), the CMake build (library target, install /
`find_package` export, sanitizer option), the `Makefile`, `.clang-format`, and
the C++ CI (`ci-cpp.yml` + `push.yml` path filter). Builds the library.

Part of the #415 split (4/6).

### Merge order
Off `main`. **Tests (5/6) and examples (6/6) are stacked on this PR** and merge
after it. The `add_subdirectory(tests)`/`(examples)` wiring is intentionally
not here yet — it arrives with those PRs.

Draft until the stack is reviewed.

Split from #415.

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
Warn against combining the fire-and-forget _nowait ingest APIs with a
custom HeadersProvider: detached background tasks are not drained by
close() or the destructor, so they can call back into the provider after
the Stream releases it.

Re-enable LeakSanitizer in the ASan CI job (was detect_leaks=0, which
hid all leaks) with a narrow suppression file covering only the
intentional once_cell/tokio runtime globals, so real wrapper leaks stay
visible.

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
@zlata-stefanovic-db zlata-stefanovic-db linked an issue Jun 25, 2026 that may be closed by this pull request
5 tasks
- close(): keep the handle alive on a failed close so get_unacked_records() and retry still work; free only on success
- ingest_*_records: reject empty batches instead of returning the FFI -2 sentinel as an offset; nowait batch variants no-op on empty
- headers callback: signal a non-null error on OOM instead of failing open; reject keys/values containing embedded NUL bytes

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
- Sdk::create(): route through the builder so the user-agent reports zerobus-sdk-cpp instead of the Rust default
- callback_max_wait_time_ms: leave the FFI default in place on nullopt instead of forcing None
- SdkBuilder: type the handle as CZerobusSdkBuilder* instead of void*
- add missing <cstddef>/<utility> includes to public headers

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
- installed CMake config recreates the zerobus_ffi target the export references, fixing external link failures
- FFI custom command depends on Rust sources so edits trigger a rebuild
- gate tests/examples options on existing subdirs; fail configure on version.hpp drift
- narrow LSan suppressions to lazy-init/runtime construction
- drop the redundant use_local_sdk patch from C++ CI

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
void ingest_json_records_nowait(const std::vector<std::string>& records);

/// Block until the record at `offset` has been acknowledged by the server.
void wait_for_offset(std::int64_t offset);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for this PR, just something to think about and see if it makes sense and is worth doing as a follow-up.

The C++ SDK has the pull-based ack model (ingest → offset → wait_for_offset / flush) but no push-based AckCallback like Python and Java have.

For Go this gap is fine - goroutines are userspace-scheduled by the Go runtime, multiplexed onto a small number of OS threads. Blocking a goroutine on wait_for_offset just suspends it and yields its OS thread to another goroutine; no kernel involvement. You can have thousands of goroutines blocked on ack tracking with negligible cost.

C++ std::thread maps 1:1 to an OS thread: kernel-scheduled. Blocking one on wait_for_offset parks that OS thread doing nothing until the server ack arrives. Scaling this across many concurrent streams might get expensive fast.

The idiomatic C++ solution for "notify me when something happens" is a callback, not a blocked thread. With AckCallback, the Rust tokio runtime fires the callback from its own thread pool when the ack arrives - the application thread never blocks waiting for acks and no extra OS threads are needed.

For now the pull model is functional. I suggest looking into this as a follow up to see if it makes sense to implement it (adding AckCallback to the C FFI). Some issues I see in implementation to think about:

  • Language boundaries
  • Callback object lifetime

Comment thread cpp/include/zerobus/stream.hpp Outdated
Comment on lines +59 to +61
/// Ingest a batch of JSON records. Returns the offset of the last record.
/// Throws `ZerobusException` if `records` is empty.
std::int64_t ingest_json_records(const std::vector<std::string>& records);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do other SDKs do in case that records is empty, do they also throw an exception?

@zlata-stefanovic-db zlata-stefanovic-db Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of them don't, this is a good idea to change it for consistency

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made this change, thank you! @irinatomic-db

`ingest_proto_records` / `ingest_json_records` threw `ZerobusException`
on an empty batch. No other SDK treats an empty batch as an error: the
Rust core returns `Ok(None)`, the FFI returns its `-2` sentinel with a
success result, and the Go wrapper maps that to `-1` with no error.

Return `-1` (a no-op) for an empty batch instead of throwing, bringing
C++ in line with the other SDKs. `-1` is unambiguous since real offsets
are non-negative. Update the header docs accordingly. The `_nowait`
batch variants already no-op on empty, so they are unchanged.

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
The empty-batch no-op comments in stream.cpp/stream.hpp exceeded the
80-column limit, failing the clang-format CI check. Reflow them; no code
change.

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
Comment on lines +63 to +74
/// argument-validation errors are reported (as exceptions). Ingestion errors
/// are silently dropped. The stream must outlive the background work.
///
/// WARNING — do not combine the `_nowait` APIs with a custom
/// `HeadersProvider`. A fire-and-forget task is detached: neither `close()`
/// nor the destructor drains it, and a task that still needs fresh headers
/// may call back into the provider after the `Stream` (and the `shared_ptr`
/// keeping the provider alive) is destroyed — a use-after-free. The FFI
/// exposes no way to drain these tasks, so there is no safe ordering. With a
/// `HeadersProvider`, use only the blocking ingest variants, which complete
/// before they return.
void ingest_proto_record_nowait(const std::uint8_t* data, std::size_t len);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc warning is correct but insufficient - it relies on the caller reading and following the warning. The Rust background task is detached; neither close() nor the destructor drains it, so a task that needs fresh headers can call back through a raw pointer after provider_ is destroyed. This should be enforced in code, not just documented.

Maybe smth like:

  void Stream::ingest_proto_record_nowait(...) {
      if (provider_ != nullptr)
          throw ZerobusException("_nowait APIs cannot be used with a custom HeadersProvider", false);
      ...
  }

@zlata-stefanovic-db zlata-stefanovic-db Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this. I discussed it with Danilo already — it’s a real issue, will do this C++ mitigation now.
The reason I didn’t do the full fix in this PR is that the root issue is at the C FFI contract, not just in C++ wrapper logic. For this beta PR we kept the warning-only mitigation to avoid cross-SDK FFI changes right now, but we should enforce it in code.
I’ll file a follow-up for _nowait + custom HeadersProvider enforcement (starting with C++ guard, then proper FFI-safe lifecycle fix).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the mitigation now

Comment thread cpp/src/stream.cpp
Comment on lines +101 to +105
std::int64_t Stream::ingest_proto_record(const std::uint8_t* data,
std::size_t len) {
detail::ResultGuard guard;
std::int64_t offset =
zerobus_stream_ingest_proto_record(handle_, data, len, guard.ptr());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FFI returns -2 for an empty batch and -1 for errors, alongside CResult.success. C++ relies entirely on throw_if_error() and hands the raw offset straight to the caller. If the FFI ever returned a negative sentinel with success == true, the caller would get -2 as a real offset and pass it to wait_for_offset(-2). Go defends against this explicitly (if offset == -2 { return -1, nil } then if offset < 0 { ... error }). Worth adding a post-call guard:

guard.throw_if_error();
  if (offset < 0)
      throw ZerobusException("unexpected negative offset from FFI", false);
  return offset;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this as well, thank you!

Throw when _nowait APIs are used with a custom HeadersProvider to prevent callback lifetime UAF risk.

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
The blocking ingest_* methods returned the FFI's raw int64 straight to
the caller after throw_if_error(). The FFI overloads that return value
with negative sentinels (-1 error, -2 empty batch) separately from
CResult.success, so a negative value arriving with success set would be
handed back as a real offset and could reach wait_for_offset(-2).

Add a checked_offset() helper that throws ZerobusException on a negative
offset, applied to ingest_proto_record, ingest_json_record,
ingest_proto_records, and ingest_json_records. The batch methods still
short-circuit the empty case to -1 before the FFI call, so that path
never hits the guard. Mirrors the explicit offset < 0 check Go already
performs.

Signed-off-by: Zlata Stefanovic <zlata.stefanovic@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature-request Net-new capability requested by customers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Add release workflow and cut the 0.1.0 release

2 participants