From 99356997236d024e73fe596571774e748fa57d35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 07:29:11 +0200 Subject: [PATCH 1/6] =?UTF-8?q?docs(sql):=20add=20Cross-Index=20JOIN=20doc?= =?UTF-8?q?=20=E2=80=94=20Markdown=20twin=20(Story=2017.1)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NEW documentation/sql/joins.md — the in-repo Markdown twin of the canonical website page (SOFTNETWORK-APP/softclient4es-web#11): two-superpowers lead, the JOIN ladder, the three rows (same-cluster passthrough / cross-cluster conveyor / multi-source coordinator) with worked SQL transcribed verbatim from the JDBC + federation integration tests, all three diagrams as fenced ASCII, the qualitative performance section, and the quota-as-meter licensing section. Add the page to the SQL doc index (README.md) and reconcile the stale JOIN-limitation copy in dql_statements.md (the top "What is supported" list and the Limitations "joins only via Materialized Views" bullet) to point at the new page. Cross-links to not-yet-merged targets are marked pending (Epic-17-close launch gate). Dual-docs parity verified against the canonical MDX. Closed Issue #135 Co-Authored-By: Claude Opus 4.8 (1M context) --- documentation/sql/README.md | 1 + documentation/sql/dql_statements.md | 5 +- documentation/sql/joins.md | 381 ++++++++++++++++++++++++++++ 3 files changed, 385 insertions(+), 2 deletions(-) create mode 100644 documentation/sql/joins.md diff --git a/documentation/sql/README.md b/documentation/sql/README.md index 996339c6..4bfc29ed 100644 --- a/documentation/sql/README.md +++ b/documentation/sql/README.md @@ -17,6 +17,7 @@ Welcome to the SQL Engine Documentation. Navigate through the sections below: - [DDL Support](ddl_statements.md) - [DML Support](dml_statements.md) - [DQL Support](dql_statements.md) +- [Cross-Index JOIN](joins.md) - [Materialized Views](materialized_views.md) - [Telemetry & Privacy](../client/telemetry.md) - [Telemetry & Privacy](telemetry.md) diff --git a/documentation/sql/dql_statements.md b/documentation/sql/dql_statements.md index 9fd32d32..ef64df9f 100644 --- a/documentation/sql/dql_statements.md +++ b/documentation/sql/dql_statements.md @@ -12,7 +12,8 @@ DQL supports: - `SELECT` with expressions, aliases, nested fields, STRUCT and ARRAY - `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, `LIMIT`, `OFFSET` - `UNION ALL` -- `JOIN UNNEST` on `ARRAY` +- cross-index JOINs (`INNER` / `LEFT` / `RIGHT` / `FULL OUTER`) across indices and clusters — see [Cross-Index JOIN](joins.md) +- `JOIN UNNEST` on `ARRAY` (the single-index nested form, handled natively inside one index) - aggregations, parent-level aggregations on nested arrays - window functions with `OVER` - rich function support (numeric, string, date/time, geo, conditional, type conversion) @@ -854,7 +855,7 @@ Notes: Even though the DQL engine is powerful, some SQL features are not (yet) supported: -- Traditional SQL joins are supported only through the use of Materialized Views (only `JOIN UNNEST` on `ARRAY` is available natively) +- Cross-index JOINs (`INNER` / `LEFT` / `RIGHT` / `FULL OUTER`) are supported across indices and clusters — see [Cross-Index JOIN](joins.md). `JOIN UNNEST` on `ARRAY` is the single-index nested form, handled natively inside one index. - No correlated subqueries - No arbitrary subqueries in `SELECT` or `WHERE` (except `INSERT ... AS SELECT` in DML) - No `GROUPING SETS`, `CUBE`, `ROLLUP` diff --git a/documentation/sql/joins.md b/documentation/sql/joins.md new file mode 100644 index 00000000..a05b9eab --- /dev/null +++ b/documentation/sql/joins.md @@ -0,0 +1,381 @@ +# Cross-Index JOIN + +**Stop ETL'ing Elasticsearch into your warehouse just to JOIN it.** + +Elasticsearch has no native cross-index JOIN — and a stock JDBC driver on top of ES can't add one either. SoftClient4ES gives you two things Elasticsearch can't do on its own: + +1. **Query-time cross-index JOIN** — `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` joins across indices (and across clusters), at query time, on **every** surface: the REPL, the JDBC driver, the ADBC driver, the Arrow Flight SQL sidecar, and Federation. The drivers are the **free delivery channel**; the JOIN *depth* is metered. +2. **Persisted [Materialized Views](materialized_views.md)** — the other superpower: denormalize cross-index data once into a queryable view. (That has its own page; this page is about query-time JOINs.) + +## The JOIN ladder + +Three meters gate how far a JOIN can reach. Think of them as rungs: + +- **Query-time depth — `maxJoins`** — how many cross-index JOINs a single query may contain. +- **Cross-cluster reach — `maxClusters`** — how many Elasticsearch clusters a Federation deployment may span. +- **Persisted — `maxMaterializedViews`** — how many denormalized views you may keep materialized. + +Every tier *has* every feature. The meters gate **scale**, not on/off switches — see [Licensing & the meters](#licensing--the-meters). + +## Which row do I need? + +Cross-index JOIN ships in three shapes ("rows"). Pick by where your data lives: + +| Row | Name | What it does | Engine | Surfaces | +|---|---|---|---|---| +| **Row 1** | Passthrough / same-cluster | Cross-index JOIN of 2+ indices in **one** ES cluster | Each leg is an ES sub-query; the JOIN runs in-process in embedded **DuckDB** | REPL, JDBC, ADBC, Flight SQL sidecar | +| **Row 2** | Cross-cluster conveyor | JOIN / INSERT / CTAS where **source and target live in different clusters** | Coordinator runs the source SELECT and conveyors the Arrow stream to the target sidecar for bulk-load | Federation | +| **Row 3** | Multi-source coordinator | JOIN across **2+ source clusters** | Coordinator stages each leg to Parquet scratch + a per-query DuckDB view, joins coordinator-local | Federation | + +**One sentence to decide:** same cluster → **Row 1**; copy results between two clusters → **Row 2**; join across two or more clusters → **Row 3**. + +The rule the engine actually applies: `QueryRouter.routeWriteWithJoin` counts the **distinct source catalogs** in the rewritten `FROM` / `JOIN` clauses versus the target catalog. Same (or no) catalog → Row 1; exactly one source catalog different from the target → Row 2; two or more source catalogs → Row 3. + +> **What does NOT work yet:** Cross-index JOINs are first-class in R1, but two things are intentionally **not** here yet: arbitrary **subqueries / CTEs** in a JOIN query land in **R2a**, and **heterogeneous Row-3 sources** (joining ES with Postgres, MySQL, Snowflake, …) land in **R2b** — R1 Row 3 is multi-**Elasticsearch** only. See Known limitations (_pending 17.6_) for the full list. + +--- + +## Row 1 — same-cluster passthrough + +A Row-1 JOIN reaches two or more indices in **one** Elasticsearch cluster. Each table in the query becomes its own ES sub-query (with any single-table `WHERE` pushed down into it); the cross-index JOIN itself is executed in-process by an embedded **DuckDB** engine inside the driver, sidecar, or REPL. The first JOIN on a fresh process warms up the DuckDB native library once. + +``` + SQL: SELECT ... FROM emp e JOIN dept d ON e.dept_id = d.dept_id + │ + ┌─────────────┴─────────────┐ + │ Driver / sidecar / REPL │ + │ (embedded DuckDB engine) │ + └─────────────┬─────────────┘ + sub-query e │ │ sub-query d + ▼ ▼ + ┌────────────┐ ┌────────────┐ + │ ES index │ │ ES index │ ← one cluster + │ emp │ │ dept │ + └────────────┘ └────────────┘ + └────► joined in DuckDB ◄────┘ → rows to client +``` + +All examples below are transcribed verbatim from the JDBC integration suite +`softclient4es-jdbc/testkit/src/main/scala/app/softnetwork/elastic/jdbc/JdbcIntegrationSpec.scala`. +They run against two fixtures: **`jdbc_join_emp`** (`emp_id`, `dept_id`, `name`, `salary` — 6 rows: Alice/1/1/6000, Bob/2/1/4000, Carol/3/1/8000, Dave/4/2/5500, Eve/5/2/3000, Orphan/6/**99**/4500) and **`jdbc_join_dept`** (`dept_id`, `dept_name` — Engineering=1, Marketing=2, Empty=9). The orphan employee (`dept_id`=99) and the empty department (`dept_id`=9) surface the outer-join NULLs. + +A few examples use a separate small fixture **`jdbc_test`** (`id`, `name`, `value` — a 3-column table, **not** the employees table; it has no `salary`/`dept_id`). + +### SELECT — INNER / LEFT / RIGHT / FULL OUTER + +**INNER JOIN** (the canonical first example): + +```sql +SELECT e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id; +-- 5 rows (orphan employee with dept_id=99 dropped by the INNER JOIN) +``` + +**LEFT JOIN** with NULLs — unmatched left rows surface NULL on the right (this one uses the `jdbc_test` fixture): + +```sql +SELECT t.id, t.name, d.dept_name +FROM jdbc_test t +LEFT JOIN jdbc_join_dept d ON t.id = d.dept_id +ORDER BY t.id ASC; +-- only ids 1→Engineering and 2→Marketing match; the rest get NULL dept_name +``` + +**RIGHT OUTER** and **FULL OUTER**: + +```sql +-- RIGHT OUTER: dept side fully preserved; empty dept 9 → NULL e.name +SELECT e.name, d.dept_name +FROM jdbc_join_emp e +RIGHT OUTER JOIN jdbc_join_dept d ON e.dept_id = d.dept_id +ORDER BY d.dept_name ASC; + +-- FULL OUTER: NULLs on both sides (empty dept 9 → NULL name; orphan emp dept 99 → NULL dept_name) +SELECT e.name, d.dept_name +FROM jdbc_join_emp e +FULL OUTER JOIN jdbc_join_dept d ON e.dept_id = d.dept_id; +``` + +### Predicate pushdown + +Each single-table `WHERE` predicate is pushed into its own ES sub-query before the JOIN, so a filtered JOIN returns strictly fewer rows: + +```sql +SELECT e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id +WHERE e.salary > 5000 AND d.dept_name = 'Engineering'; +``` + +### GROUP BY / HAVING / ORDER BY + +Aggregation runs in DuckDB after the JOIN: + +```sql +SELECT d.dept_name, COUNT(*) AS cnt +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id +GROUP BY d.dept_name +HAVING COUNT(*) > 1 +ORDER BY COUNT(*) DESC; +-- Engineering (3), Marketing (2) survive HAVING +``` + +> **Two ORDER BY gotchas:** the JOIN planner has two ordering restrictions — you cannot `ORDER BY` a **SELECT alias** (use `ORDER BY COUNT(*)`, not `ORDER BY cnt`), and you cannot `ORDER BY` a column that exists on **both** sides of the JOIN (order by a column unique to one side, e.g. `d.dept_name`, not the shared join key `d.dept_id`). + +### ORDER BY … LIMIT (top-N) + +```sql +SELECT e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id +ORDER BY e.salary DESC LIMIT 3; +-- 8000 / 6000 / 5500 +``` + +### UNNEST + cross-index JOIN + +Elasticsearch handles `JOIN UNNEST` on an `ARRAY` natively inside the sub-query; the cross-index JOIN to the dimension runs in DuckDB. `UNNEST` does **not** count toward `maxJoins`: + +```sql +SELECT o.id, oi.product, oi.quantity, d.dept_name +FROM jdbc_join_order_items o +JOIN UNNEST(o.items) AS oi +JOIN jdbc_join_dept d ON o.customer_id = d.dept_id; +-- one row per nested item; the UNNESTed nested fields (product, quantity) populate +``` + +### INSERT … SELECT … JOIN + +Write the joined output into a target index (Row-1 write; orphan dropped → 5 rows): + +```sql +INSERT INTO jdbc_row1_insert_join_target +SELECT e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id; +``` + +### CREATE TABLE … AS SELECT … JOIN (CTAS) + +```sql +CREATE TABLE jdbc_row1_ctas_join_target AS +SELECT e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id; +``` + +### CREATE OR REPLACE TABLE … AS + +Real REPLACE — drops then recreates, so re-running a narrower query shrinks the table: + +```sql +CREATE OR REPLACE TABLE jdbc_row1_replace_ctas_target AS +SELECT e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id; +``` + +### INSERT … ON CONFLICT (col) DO UPDATE (upsert) + +Idempotent upsert — a stable SHA-1 `_id` is derived from the conflict column, so re-running merges rather than duplicating (count stays 5, not 10): + +```sql +INSERT INTO jdbc_row1_insert_join_upsert_target +SELECT e.emp_id, e.name, e.salary, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id +ON CONFLICT (emp_id) DO UPDATE; +``` + +> **Two ON CONFLICT forms are rejected:** **CTAS + ON CONFLICT** and **INSERT + ON CONFLICT DO NOTHING** are rejected at the boundary — Elasticsearch has no native skip-on-conflict with stable ids. Use `INSERT … ON CONFLICT (col) DO UPDATE` for upserts. + +### Prepared statement through a JOIN + +A bound parameter is substituted into the sub-query **before** planning, so the same `PreparedStatement` returns different rows for different bindings: + +```sql +SELECT e.name, d.dept_name +FROM jdbc_join_emp e +JOIN jdbc_join_dept d ON e.dept_id = d.dept_id +WHERE e.salary > ?; -- bind 3500.0 → more rows; bind 6500.0 → fewer (Carol = 8000 survives) +``` + +--- + +## Row 2 — cross-cluster conveyor + +A Row-2 operation has its **target** in a different cluster from its **source**. The Federation coordinator runs the source SELECT on the source cluster's sidecar, receives the result as an Arrow stream, and **conveyors** it to the target sidecar for a bulk-load. (The JOIN in 2a/2b is itself single-source — both source tables are in `prod_us` — but the *target* `prod_eu` is a different cluster, and that is what makes it Row 2. A plain catalog-to-catalog `INSERT … SELECT *` with no JOIN is also a Row-2 conveyor.) + +``` + ┌──────────────────────────┐ + │ Federation coordinator │ + └───┬──────────────────▲───┘ + 1. │ run source SELECT │ 2. Arrow stream + ▼ │ + ┌────────────────────┐ │ ┌────────────────────┐ + │ prod_us sidecar │──┘ │ prod_eu sidecar │ ← target + └────────────────────┘ └─────────▲──────────┘ + └──────── 3. bulk-load conveyor ───┘ +``` + +Cross-cluster references use **backtick-quoted catalog prefixes** — the catalog name is the Federation `servers.` alias, which Federation strips before forwarding each leg's SELECT to its source cluster. + +Examples are transcribed from +`softclient4es-arrow/federation/src/test/scala/app/softnetwork/elastic/arrow/federation/FederationJoinE2ESpec.scala`. + +**Cross-cluster INSERT-with-JOIN** — the source SELECT runs on `prod_us`, conveyed to `prod_eu`: + +```sql +INSERT INTO `prod_eu`.dest +SELECT o.id, c.name +FROM `prod_us`.orders o +JOIN `prod_us`.customers c ON o.id = c.id; +``` + +**Cross-cluster CTAS-with-JOIN** — the target DDL is inferred from the source FlightInfo schema, then conveyed: + +```sql +CREATE TABLE `prod_eu`.dest AS +SELECT o.id, c.name +FROM `prod_us`.orders o +JOIN `prod_us`.customers c ON o.id = c.id; +``` + +--- + +## Row 3 — multi-source coordinator + +A Row-3 JOIN reaches **two or more source clusters**. The coordinator stages each leg to Parquet scratch on disk and exposes it as a per-query DuckDB view (named `q_` for isolation), then runs the JOIN **coordinator-local**. + +> **R1 = multi-Elasticsearch only:** in R1, Row 3 joins across **multiple Elasticsearch clusters**. Joining Elasticsearch against heterogeneous sources (Postgres, MySQL, Snowflake, …) is **coming in R2b** — it is not promised for R1. + +``` + ┌──────────────┐ leg 1 ┌──────────────────────────┐ + │ prod_us │──────────▶│ Federation coordinator │ + └──────────────┘ │ │ + ┌──────────────┐ leg 2 │ Parquet scratch + a │ + │ prod_fr │──────────▶│ per-query DuckDB view │ + └──────────────┘ └────────────┬─────────────┘ + ▼ + coordinator-local JOIN + │ + ▼ + ┌──────────────────────────┐ + │ prod_eu (target) │ + └──────────────────────────┘ +``` + +**Multi-source SELECT JOIN** (read; the headline three-cluster query — two source catalogs, joined coordinator-local): + +```sql +SELECT o.id, c.name +FROM `prod_us`.orders o +JOIN `prod_eu`.customers c ON o.id = c.id; +-- two source catalogs (prod_us, prod_eu) → coordinator-local join +``` + +This is the SRE wedge: **correlate logs, metrics, and traces across regional ES clusters in one SQL query.** (The full SRE story lives in the Federation operator guide (_pending 16.6/17.2 merge_) and the `three-region` example topology.) + +**Multi-source INSERT-with-JOIN** (sources `prod_us` + `prod_fr`, target `prod_eu`): + +```sql +INSERT INTO `prod_eu`.dest +SELECT o.id, c.name +FROM `prod_us`.orders o +JOIN `prod_fr`.customers c ON o.customer_id = c.id; +``` + +**Multi-source CREATE OR REPLACE TABLE … AS:** + +```sql +CREATE OR REPLACE TABLE `prod_eu`.dest AS +SELECT o.id, c.name +FROM `prod_us`.orders o +JOIN `prod_fr`.customers c ON o.customer_id = c.id; +``` + +--- + +## Performance characteristics + +What to expect per row, qualitatively. (Hard latency and throughput numbers belong to the R1.1 Arrow benchmark — link forward, don't pre-empt.) + +- **Row 1** — lowest latency of the three rows; the JOIN is in-process DuckDB over ES sub-query results, so the dominant cost is the ES sub-queries plus the cross product. The first JOIN on a fresh process pays a one-time DuckDB native-library warm-up. +- **Row 2** — adds a network hop (coordinator → source SELECT) plus a bulk-load conveyor to the target; throughput is bound by the Arrow stream and target ingest, not by JSON — data crosses the wire as Arrow RecordBatches. +- **Row 3** — adds per-leg Parquet staging on the coordinator before the DuckDB join; cost scales with the **largest** leg plus the join cardinality. +- **Arrow wire format** — for **Flight SQL** and **ADBC** clients specifically, results stream as Arrow RecordBatches (zero JSON on the wire). The JDBC driver surfaces JOIN rows via an in-process Arrow result set and the REPL renders to a console, so the clean zero-JSON-wire property is a Flight-SQL/ADBC trait, not a blanket one. The value here is interoperability and SQL completeness; the full zero-copy speed story is the R1.1 benchmark. + +### Row truncation at the result cap + +The **joined output** is capped at the tier's `maxQueryResults` (Community 10,000). Over-cap results are **truncated with a warning** — never silently dropped: JDBC and ADBC raise `SQLWarning 01004` (`Result truncated to N rows`), and Flight SQL emits an `x-result-truncated` header. + +> **Join inputs are never capped:** the result cap applies to the **joined output only**, never to a JOIN leg. Capping a leg would truncate a JOIN input and produce **silently wrong** results. So a wide Community join can hit the 10,000-row output cap even though none of its inputs were truncated. + +## Licensing & the meters + +Every tier **has every feature.** The meter gates *scale*, not an on/off switch. + +- **`maxJoins` — JOIN depth per query.** A 2-table JOIN counts as **1** JOIN, a 3-table JOIN as **2**, a 4-table JOIN as **3**. Community **2** (up to a 3-table JOIN), Pro **5** (up to a 6-table JOIN), Enterprise **∞**. `UNNEST` does **not** count toward `maxJoins`. +- **`maxClusters` — cross-cluster reach.** Community **1**, Pro **5**, Enterprise **∞**. Single-cluster Federation is free — the **meter**, not a feature flag, is the paywall. **ES-only at R1** (phrased "across N ES clusters"). +- **`maxMaterializedViews` — persisted views.** Community **1**, Pro **50**, Enterprise **∞**. + +**Community has Federation.** It is capped at 1 cluster — the quota *is* the paywall, not a feature gate. One sidecar / one cluster boots free; a second cluster makes the Federation sidecar fail to start by design. + +The same meters, in source-of-truth order (MV / results / clusters / joins): + +| Tier | `maxMaterializedViews` | `maxQueryResults` | `maxClusters` | `maxJoins` | +|---|---|---|---|---| +| Community | 1 | 10,000 | 1 | 2 | +| Pro | 50 | 1,000,000 | 5 | 5 | +| Enterprise | ∞ | ∞ | ∞ | ∞ | + +### What happens at each cap (verified) + +- **`maxJoins` exceeded** → the planner rejects the query. A 4-table query (3 cross-index JOINs) under Community is rejected with a message that names the count, the limit, the next tier, and the upgrade URL: + +```sql +-- 4 tables = 3 cross-index JOINs → exceeds Community maxJoins=2 → rejected +SELECT t.id, d.dept_name, r.region_name, m.team_name +FROM jdbc_test t +JOIN jdbc_join_dept d ON t.id = d.dept_id +JOIN jdbc_join_region r ON t.id = r.region_id +JOIN jdbc_join_team m ON t.id = m.team_id; +-- Error: "Query contains 3 cross-index JOINs ... maximum of 2 ... Upgrade to Pro ... +-- See: https://portal.softclient4es.com/pricing" +``` + + The same query runs unchanged on Pro or Enterprise (higher / no cap). + +- **`maxClusters` exceeded** → the Federation sidecar **fails to start** (by design — a CrashLoop) rather than silently dropping a cluster. +- **`maxQueryResults` exceeded** → **truncate-with-warning** on the no-`LIMIT` path (`SQLWarning 01004` / Flight `x-result-truncated`), or an HTTP 402 on an explicit `LIMIT` over quota. + +The JOIN engine ships in the Elastic-License extensions — **free to use** (not "source-available"). + +For the full price matrix and editions, see the licensing & pricing page on the website. + +--- + +## What does NOT work yet + +- **Arbitrary subqueries and CTEs** inside a JOIN query — coming in **R2a**. +- **Heterogeneous Row-3 sources** (joining Elasticsearch with Postgres, MySQL, Snowflake, …) — coming in **R2b**; R1 Row 3 is multi-Elasticsearch only. + +Full list: Known limitations (_pending 17.6_). + +## Try it in 5 minutes + +- JDBC quickstart — single-cluster JOIN from any JDBC tool. +- ADBC quickstart (_pending 17.3_) — in-process Arrow. +- Arrow Flight SQL quickstart — columnar streaming. +- Federation operator guide (_pending 16.6/17.2 merge_) — cross-cluster (Rows 2 & 3) with the Helm chart. +- Example topologies live in the [`softclient4es-helm`](https://github.com/SOFTNETWORK-APP/softclient4es-helm) repo: `softclient4es-federation/examples/single-cluster/` and `.../three-region/`. (_pending Epic-16 topologies_) + +## See also + +- [Materialized Views](materialized_views.md) — superpower #2: persisted, pre-joined data. +- [DQL Support](dql_statements.md) — non-JOIN SELECT and `JOIN UNNEST` detail. +- Licensing & pricing — the full edition / quota matrix (on the website). +- Known limitations (_pending 17.6_). +- Federation operator guide (_pending 16.6/17.2 merge_). From 07dbf71cead8b92c34fc502d54f87de25016c8ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 12:20:23 +0200 Subject: [PATCH 2/6] docs: remove private-repo source-path/module references from public docs Co-Authored-By: Claude Opus 4.8 (1M context) --- documentation/sql/joins.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/documentation/sql/joins.md b/documentation/sql/joins.md index a05b9eab..0c6f73e7 100644 --- a/documentation/sql/joins.md +++ b/documentation/sql/joins.md @@ -55,8 +55,7 @@ A Row-1 JOIN reaches two or more indices in **one** Elasticsearch cluster. Each └────► joined in DuckDB ◄────┘ → rows to client ``` -All examples below are transcribed verbatim from the JDBC integration suite -`softclient4es-jdbc/testkit/src/main/scala/app/softnetwork/elastic/jdbc/JdbcIntegrationSpec.scala`. +All examples below are transcribed verbatim from the SoftClient4ES JDBC integration test suite. They run against two fixtures: **`jdbc_join_emp`** (`emp_id`, `dept_id`, `name`, `salary` — 6 rows: Alice/1/1/6000, Bob/2/1/4000, Carol/3/1/8000, Dave/4/2/5500, Eve/5/2/3000, Orphan/6/**99**/4500) and **`jdbc_join_dept`** (`dept_id`, `dept_name` — Engineering=1, Marketing=2, Empty=9). The orphan employee (`dept_id`=99) and the empty department (`dept_id`=9) surface the outer-join NULLs. A few examples use a separate small fixture **`jdbc_test`** (`id`, `name`, `value` — a 3-column table, **not** the employees table; it has no `salary`/`dept_id`). @@ -222,8 +221,7 @@ A Row-2 operation has its **target** in a different cluster from its **source**. Cross-cluster references use **backtick-quoted catalog prefixes** — the catalog name is the Federation `servers.` alias, which Federation strips before forwarding each leg's SELECT to its source cluster. -Examples are transcribed from -`softclient4es-arrow/federation/src/test/scala/app/softnetwork/elastic/arrow/federation/FederationJoinE2ESpec.scala`. +Examples are transcribed from the SoftClient4ES Federation integration test suite. **Cross-cluster INSERT-with-JOIN** — the source SELECT runs on `prod_us`, conveyed to `prod_eu`: From 0e054a4784b05b3ebecb72b6dc6cf3f810cae543 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 14:28:37 +0200 Subject: [PATCH 3/6] docs(17.1): externalize diagrams to SVG, drop internal symbol, rename R1/R2x, remove story refs Co-Authored-By: Claude Opus 4.8 --- documentation/sql/diagrams/join-row1.svg | 44 +++++++++++++ documentation/sql/diagrams/join-row2.svg | 32 +++++++++ documentation/sql/diagrams/join-row3.svg | 39 +++++++++++ documentation/sql/joins.md | 84 ++++++++---------------- 4 files changed, 141 insertions(+), 58 deletions(-) create mode 100644 documentation/sql/diagrams/join-row1.svg create mode 100644 documentation/sql/diagrams/join-row2.svg create mode 100644 documentation/sql/diagrams/join-row3.svg diff --git a/documentation/sql/diagrams/join-row1.svg b/documentation/sql/diagrams/join-row1.svg new file mode 100644 index 00000000..dc8fa7c8 --- /dev/null +++ b/documentation/sql/diagrams/join-row1.svg @@ -0,0 +1,44 @@ + + + + + + + + + SQL: SELECT ... FROM emp e JOIN dept d ON e.dept_id = d.dept_id + + + + + Driver / sidecar / REPL + (embedded DuckDB engine) + + + + sub-query e + + sub-query d + + + + one cluster + + + + ES index: emp + + ES index: dept + + + + + + + + joined in DuckDB + + + + rows to client + diff --git a/documentation/sql/diagrams/join-row2.svg b/documentation/sql/diagrams/join-row2.svg new file mode 100644 index 00000000..d3f55aa2 --- /dev/null +++ b/documentation/sql/diagrams/join-row2.svg @@ -0,0 +1,32 @@ + + + + + + + + + + Federation coordinator + + + + prod_us sidecar + + + + prod_eu sidecar + (target) + + + + 1. run source SELECT + + + + 2. Arrow stream + + + + 3. bulk-load conveyor + diff --git a/documentation/sql/diagrams/join-row3.svg b/documentation/sql/diagrams/join-row3.svg new file mode 100644 index 00000000..d4def6f3 --- /dev/null +++ b/documentation/sql/diagrams/join-row3.svg @@ -0,0 +1,39 @@ + + + + + + + + + + prod_us + + + + prod_fr + + + + Federation coordinator + Parquet scratch + + per-query DuckDB view + + + + leg 1 + + + + leg 2 + + + + + coordinator-local JOIN + + + + + prod_eu (target) + diff --git a/documentation/sql/joins.md b/documentation/sql/joins.md index 0c6f73e7..fbca93b7 100644 --- a/documentation/sql/joins.md +++ b/documentation/sql/joins.md @@ -29,9 +29,9 @@ Cross-index JOIN ships in three shapes ("rows"). Pick by where your data lives: **One sentence to decide:** same cluster → **Row 1**; copy results between two clusters → **Row 2**; join across two or more clusters → **Row 3**. -The rule the engine actually applies: `QueryRouter.routeWriteWithJoin` counts the **distinct source catalogs** in the rewritten `FROM` / `JOIN` clauses versus the target catalog. Same (or no) catalog → Row 1; exactly one source catalog different from the target → Row 2; two or more source catalogs → Row 3. +The rule the engine actually applies: it counts the **distinct source catalogs** in the rewritten `FROM` / `JOIN` clauses versus the target catalog. Same (or no) catalog → Row 1; exactly one source catalog different from the target → Row 2; two or more source catalogs → Row 3. -> **What does NOT work yet:** Cross-index JOINs are first-class in R1, but two things are intentionally **not** here yet: arbitrary **subqueries / CTEs** in a JOIN query land in **R2a**, and **heterogeneous Row-3 sources** (joining ES with Postgres, MySQL, Snowflake, …) land in **R2b** — R1 Row 3 is multi-**Elasticsearch** only. See Known limitations (_pending 17.6_) for the full list. +> **What does NOT work yet:** Cross-index JOINs are first-class in this release, but two things are intentionally **not** here yet: arbitrary **subqueries / CTEs** in a JOIN query land in **the next release (Quarter 4 2026)**, and **heterogeneous Row-3 sources** (joining ES with Postgres, MySQL, Snowflake, …) land in **the upcoming release (Quarter 1 2027)** — this release's Row 3 is multi-**Elasticsearch** only. See [Known limitations](known_limitations.md) for the full list. --- @@ -39,21 +39,9 @@ The rule the engine actually applies: `QueryRouter.routeWriteWithJoin` counts th A Row-1 JOIN reaches two or more indices in **one** Elasticsearch cluster. Each table in the query becomes its own ES sub-query (with any single-table `WHERE` pushed down into it); the cross-index JOIN itself is executed in-process by an embedded **DuckDB** engine inside the driver, sidecar, or REPL. The first JOIN on a fresh process warms up the DuckDB native library once. -``` - SQL: SELECT ... FROM emp e JOIN dept d ON e.dept_id = d.dept_id - │ - ┌─────────────┴─────────────┐ - │ Driver / sidecar / REPL │ - │ (embedded DuckDB engine) │ - └─────────────┬─────────────┘ - sub-query e │ │ sub-query d - ▼ ▼ - ┌────────────┐ ┌────────────┐ - │ ES index │ │ ES index │ ← one cluster - │ emp │ │ dept │ - └────────────┘ └────────────┘ - └────► joined in DuckDB ◄────┘ → rows to client -``` +![Row 1 — same-cluster passthrough](diagrams/join-row1.svg) + +*Row 1 — same-cluster passthrough: the embedded DuckDB engine fans each table out to its own ES sub-query in one cluster, then joins the results in-process before returning rows to the client.* All examples below are transcribed verbatim from the SoftClient4ES JDBC integration test suite. They run against two fixtures: **`jdbc_join_emp`** (`emp_id`, `dept_id`, `name`, `salary` — 6 rows: Alice/1/1/6000, Bob/2/1/4000, Carol/3/1/8000, Dave/4/2/5500, Eve/5/2/3000, Orphan/6/**99**/4500) and **`jdbc_join_dept`** (`dept_id`, `dept_name` — Engineering=1, Marketing=2, Empty=9). The orphan employee (`dept_id`=99) and the empty department (`dept_id`=9) surface the outer-join NULLs. @@ -207,17 +195,9 @@ WHERE e.salary > ?; -- bind 3500.0 → more rows; bind 6500.0 → fewer (Carol A Row-2 operation has its **target** in a different cluster from its **source**. The Federation coordinator runs the source SELECT on the source cluster's sidecar, receives the result as an Arrow stream, and **conveyors** it to the target sidecar for a bulk-load. (The JOIN in 2a/2b is itself single-source — both source tables are in `prod_us` — but the *target* `prod_eu` is a different cluster, and that is what makes it Row 2. A plain catalog-to-catalog `INSERT … SELECT *` with no JOIN is also a Row-2 conveyor.) -``` - ┌──────────────────────────┐ - │ Federation coordinator │ - └───┬──────────────────▲───┘ - 1. │ run source SELECT │ 2. Arrow stream - ▼ │ - ┌────────────────────┐ │ ┌────────────────────┐ - │ prod_us sidecar │──┘ │ prod_eu sidecar │ ← target - └────────────────────┘ └─────────▲──────────┘ - └──────── 3. bulk-load conveyor ───┘ -``` +![Row 2 — cross-cluster conveyor](diagrams/join-row2.svg) + +*Row 2 — cross-cluster conveyor: the coordinator runs the source SELECT on `prod_us`, receives the result as an Arrow stream, and bulk-loads it onto the `prod_eu` target sidecar.* Cross-cluster references use **backtick-quoted catalog prefixes** — the catalog name is the Federation `servers.` alias, which Federation strips before forwarding each leg's SELECT to its source cluster. @@ -247,23 +227,11 @@ JOIN `prod_us`.customers c ON o.id = c.id; A Row-3 JOIN reaches **two or more source clusters**. The coordinator stages each leg to Parquet scratch on disk and exposes it as a per-query DuckDB view (named `q_` for isolation), then runs the JOIN **coordinator-local**. -> **R1 = multi-Elasticsearch only:** in R1, Row 3 joins across **multiple Elasticsearch clusters**. Joining Elasticsearch against heterogeneous sources (Postgres, MySQL, Snowflake, …) is **coming in R2b** — it is not promised for R1. +> **Multi-Elasticsearch only in this release:** in this release, Row 3 joins across **multiple Elasticsearch clusters**. Joining Elasticsearch against heterogeneous sources (Postgres, MySQL, Snowflake, …) is **coming in the upcoming release (Quarter 1 2027)** — it is not promised for this release. -``` - ┌──────────────┐ leg 1 ┌──────────────────────────┐ - │ prod_us │──────────▶│ Federation coordinator │ - └──────────────┘ │ │ - ┌──────────────┐ leg 2 │ Parquet scratch + a │ - │ prod_fr │──────────▶│ per-query DuckDB view │ - └──────────────┘ └────────────┬─────────────┘ - ▼ - coordinator-local JOIN - │ - ▼ - ┌──────────────────────────┐ - │ prod_eu (target) │ - └──────────────────────────┘ -``` +![Row 3 — multi-source coordinator](diagrams/join-row3.svg) + +*Row 3 — multi-source coordinator: each source leg (`prod_us`, `prod_fr`) is staged to Parquet scratch and exposed as a per-query DuckDB view; the JOIN runs coordinator-local before landing on the `prod_eu` target.* **Multi-source SELECT JOIN** (read; the headline three-cluster query — two source catalogs, joined coordinator-local): @@ -274,7 +242,7 @@ JOIN `prod_eu`.customers c ON o.id = c.id; -- two source catalogs (prod_us, prod_eu) → coordinator-local join ``` -This is the SRE wedge: **correlate logs, metrics, and traces across regional ES clusters in one SQL query.** (The full SRE story lives in the Federation operator guide (_pending 16.6/17.2 merge_) and the `three-region` example topology.) +This is the SRE wedge: **correlate logs, metrics, and traces across regional ES clusters in one SQL query.** (The full SRE story lives in the [Federation operator guide](../client/federation_operator_guide.md) and the `three-region` example topology.) **Multi-source INSERT-with-JOIN** (sources `prod_us` + `prod_fr`, target `prod_eu`): @@ -298,12 +266,12 @@ JOIN `prod_fr`.customers c ON o.customer_id = c.id; ## Performance characteristics -What to expect per row, qualitatively. (Hard latency and throughput numbers belong to the R1.1 Arrow benchmark — link forward, don't pre-empt.) +What to expect per row, qualitatively. (Hard latency and throughput numbers belong to a follow-up release's Arrow benchmark — link forward, don't pre-empt.) - **Row 1** — lowest latency of the three rows; the JOIN is in-process DuckDB over ES sub-query results, so the dominant cost is the ES sub-queries plus the cross product. The first JOIN on a fresh process pays a one-time DuckDB native-library warm-up. - **Row 2** — adds a network hop (coordinator → source SELECT) plus a bulk-load conveyor to the target; throughput is bound by the Arrow stream and target ingest, not by JSON — data crosses the wire as Arrow RecordBatches. - **Row 3** — adds per-leg Parquet staging on the coordinator before the DuckDB join; cost scales with the **largest** leg plus the join cardinality. -- **Arrow wire format** — for **Flight SQL** and **ADBC** clients specifically, results stream as Arrow RecordBatches (zero JSON on the wire). The JDBC driver surfaces JOIN rows via an in-process Arrow result set and the REPL renders to a console, so the clean zero-JSON-wire property is a Flight-SQL/ADBC trait, not a blanket one. The value here is interoperability and SQL completeness; the full zero-copy speed story is the R1.1 benchmark. +- **Arrow wire format** — for **Flight SQL** and **ADBC** clients specifically, results stream as Arrow RecordBatches (zero JSON on the wire). The JDBC driver surfaces JOIN rows via an in-process Arrow result set and the REPL renders to a console, so the clean zero-JSON-wire property is a Flight-SQL/ADBC trait, not a blanket one. The value here is interoperability and SQL completeness; the full zero-copy speed story is a follow-up release's benchmark. ### Row truncation at the result cap @@ -316,7 +284,7 @@ The **joined output** is capped at the tier's `maxQueryResults` (Community 10,00 Every tier **has every feature.** The meter gates *scale*, not an on/off switch. - **`maxJoins` — JOIN depth per query.** A 2-table JOIN counts as **1** JOIN, a 3-table JOIN as **2**, a 4-table JOIN as **3**. Community **2** (up to a 3-table JOIN), Pro **5** (up to a 6-table JOIN), Enterprise **∞**. `UNNEST` does **not** count toward `maxJoins`. -- **`maxClusters` — cross-cluster reach.** Community **1**, Pro **5**, Enterprise **∞**. Single-cluster Federation is free — the **meter**, not a feature flag, is the paywall. **ES-only at R1** (phrased "across N ES clusters"). +- **`maxClusters` — cross-cluster reach.** Community **1**, Pro **5**, Enterprise **∞**. Single-cluster Federation is free — the **meter**, not a feature flag, is the paywall. **ES-only in this release** (phrased "across N ES clusters"). - **`maxMaterializedViews` — persisted views.** Community **1**, Pro **50**, Enterprise **∞**. **Community has Federation.** It is capped at 1 cluster — the quota *is* the paywall, not a feature gate. One sidecar / one cluster boots free; a second cluster makes the Federation sidecar fail to start by design. @@ -357,23 +325,23 @@ For the full price matrix and editions, see the licensing & pricing page on the ## What does NOT work yet -- **Arbitrary subqueries and CTEs** inside a JOIN query — coming in **R2a**. -- **Heterogeneous Row-3 sources** (joining Elasticsearch with Postgres, MySQL, Snowflake, …) — coming in **R2b**; R1 Row 3 is multi-Elasticsearch only. +- **Arbitrary subqueries and CTEs** inside a JOIN query — coming in **the next release (Quarter 4 2026)**. +- **Heterogeneous Row-3 sources** (joining Elasticsearch with Postgres, MySQL, Snowflake, …) — coming in **the upcoming release (Quarter 1 2027)**; this release's Row 3 is multi-Elasticsearch only. -Full list: Known limitations (_pending 17.6_). +Full list: [Known limitations](known_limitations.md). ## Try it in 5 minutes -- JDBC quickstart — single-cluster JOIN from any JDBC tool. -- ADBC quickstart (_pending 17.3_) — in-process Arrow. -- Arrow Flight SQL quickstart — columnar streaming. -- Federation operator guide (_pending 16.6/17.2 merge_) — cross-cluster (Rows 2 & 3) with the Helm chart. -- Example topologies live in the [`softclient4es-helm`](https://github.com/SOFTNETWORK-APP/softclient4es-helm) repo: `softclient4es-federation/examples/single-cluster/` and `.../three-region/`. (_pending Epic-16 topologies_) +- [JDBC quickstart](../client/jdbc.md) — single-cluster JOIN from any JDBC tool. +- [ADBC quickstart](../client/adbc_driver.md) — in-process Arrow. +- [Arrow Flight SQL quickstart](../client/arrow_flight_sql.md) — columnar streaming. +- [Federation operator guide](../client/federation_operator_guide.md) — cross-cluster (Rows 2 & 3) with the Helm chart. +- Example topologies live in the [`softclient4es-helm`](https://github.com/SOFTNETWORK-APP/softclient4es-helm) repo: `softclient4es-federation/examples/single-cluster/` and `.../three-region/`. ## See also - [Materialized Views](materialized_views.md) — superpower #2: persisted, pre-joined data. - [DQL Support](dql_statements.md) — non-JOIN SELECT and `JOIN UNNEST` detail. - Licensing & pricing — the full edition / quota matrix (on the website). -- Known limitations (_pending 17.6_). -- Federation operator guide (_pending 16.6/17.2 merge_). +- [Known limitations](known_limitations.md). +- [Federation operator guide](../client/federation_operator_guide.md). From f58d9d9071b00989da16f7fb7c7aa48fa95ce361 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 15:14:23 +0200 Subject: [PATCH 4/6] docs(17.1): recolor JOIN diagrams to match architecture.svg palette (dark gradient bg, gradient-filled boxes, drop shadows) --- documentation/sql/diagrams/join-row1.svg | 77 ++++++++++++------------ documentation/sql/diagrams/join-row2.svg | 46 +++++++------- documentation/sql/diagrams/join-row3.svg | 57 +++++++++--------- 3 files changed, 87 insertions(+), 93 deletions(-) diff --git a/documentation/sql/diagrams/join-row1.svg b/documentation/sql/diagrams/join-row1.svg index dc8fa7c8..4590dac0 100644 --- a/documentation/sql/diagrams/join-row1.svg +++ b/documentation/sql/diagrams/join-row1.svg @@ -1,44 +1,41 @@ - + - - - + + + + + + - - SQL: SELECT ... FROM emp e JOIN dept d ON e.dept_id = d.dept_id - - - - - Driver / sidecar / REPL - (embedded DuckDB engine) - - - - sub-query e - - sub-query d - - - - one cluster - - - - ES index: emp - - ES index: dept - - - - - - - - joined in DuckDB - - - - rows to client + + + SELECT … FROM emp e JOIN dept d ON e.dept_id = d.dept_id + + + + Driver / sidecar / REPL + embedded DuckDB engine + + + sub-query e + + sub-query d + + + one Elasticsearch cluster + + + ES index: emp + + ES index: dept + + + + + + joined in DuckDB + + + rows to client diff --git a/documentation/sql/diagrams/join-row2.svg b/documentation/sql/diagrams/join-row2.svg index d3f55aa2..67524130 100644 --- a/documentation/sql/diagrams/join-row2.svg +++ b/documentation/sql/diagrams/join-row2.svg @@ -1,32 +1,32 @@ - + - - - + + + + + + - - - Federation coordinator + - - - prod_us sidecar + + Federation coordinator - - - prod_eu sidecar - (target) + + prod_us sidecar + source - - - 1. run source SELECT + + prod_eu sidecar + target - - - 2. Arrow stream + + 1. run source SELECT - - - 3. bulk-load conveyor + + 2. Arrow stream + + + 3. bulk-load conveyor diff --git a/documentation/sql/diagrams/join-row3.svg b/documentation/sql/diagrams/join-row3.svg index d4def6f3..7959f616 100644 --- a/documentation/sql/diagrams/join-row3.svg +++ b/documentation/sql/diagrams/join-row3.svg @@ -1,39 +1,36 @@ - + - - - + + + + + + + - - - prod_us + - - - prod_fr + + prod_us + + prod_fr - - - Federation coordinator - Parquet scratch + - per-query DuckDB view + + Federation coordinator + Parquet scratch + + per-query DuckDB view - - - leg 1 + + leg 1 + + leg 2 - - - leg 2 + + + coordinator-local JOIN - - - - coordinator-local JOIN - - - - - prod_eu (target) + + + prod_eu (target) From 89dd5fdac98f79fb4724954a7c3779e3a50b0754 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 15:27:08 +0200 Subject: [PATCH 5/6] =?UTF-8?q?docs(17.1):=20fix=20row2/row3=20arrows=20?= =?UTF-8?q?=E2=80=94=20grey,=20drawn=20behind=20boxes,=20heads=20stop=20ou?= =?UTF-8?q?tside=20source/target=20(no=20overlap)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- documentation/sql/diagrams/join-row2.svg | 18 ++++++++++-------- documentation/sql/diagrams/join-row3.svg | 18 +++++++++++------- 2 files changed, 21 insertions(+), 15 deletions(-) diff --git a/documentation/sql/diagrams/join-row2.svg b/documentation/sql/diagrams/join-row2.svg index 67524130..8c5ebf0c 100644 --- a/documentation/sql/diagrams/join-row2.svg +++ b/documentation/sql/diagrams/join-row2.svg @@ -10,6 +10,12 @@ + + + + + + Federation coordinator @@ -21,12 +27,8 @@ prod_eu sidecar target - - 1. run source SELECT - - - 2. Arrow stream - - - 3. bulk-load conveyor + + 1. run source SELECT + 2. Arrow stream + 3. bulk-load conveyor diff --git a/documentation/sql/diagrams/join-row3.svg b/documentation/sql/diagrams/join-row3.svg index 7959f616..5124e732 100644 --- a/documentation/sql/diagrams/join-row3.svg +++ b/documentation/sql/diagrams/join-row3.svg @@ -11,6 +11,13 @@ + + + + + + + prod_us @@ -21,16 +28,13 @@ Parquet scratch + per-query DuckDB view - - leg 1 - - leg 2 - - coordinator-local JOIN - prod_eu (target) + + + leg 1 + leg 2 From b157b09931b60f8a411a40cbf1d8867635a18174 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 15:40:35 +0200 Subject: [PATCH 6/6] docs(17.1): add terminal-style SQL header to each JOIN diagram (blue keywords, traffic-light bar) --- documentation/sql/diagrams/join-row1.svg | 73 ++++++++++++++---------- documentation/sql/diagrams/join-row2.svg | 50 ++++++++++------ documentation/sql/diagrams/join-row3.svg | 58 +++++++++++-------- 3 files changed, 108 insertions(+), 73 deletions(-) diff --git a/documentation/sql/diagrams/join-row1.svg b/documentation/sql/diagrams/join-row1.svg index 4590dac0..3a667a5f 100644 --- a/documentation/sql/diagrams/join-row1.svg +++ b/documentation/sql/diagrams/join-row1.svg @@ -1,4 +1,4 @@ - + @@ -8,34 +8,45 @@ - - - SELECT … FROM emp e JOIN dept d ON e.dept_id = d.dept_id - - - - Driver / sidecar / REPL - embedded DuckDB engine - - - sub-query e - - sub-query d - - - one Elasticsearch cluster - - - ES index: emp - - ES index: dept - - - - - - joined in DuckDB - - - rows to client + + + + + + + + Row 1 · same-cluster JOIN + + SELECT e.name, e.salary, d.dept_name + FROM emp e + JOIN dept d ON e.dept_id = d.dept_id; + + + + + Driver / sidecar / REPL + embedded DuckDB engine + + + sub-query e + + sub-query d + + + one Elasticsearch cluster + + + ES index: emp + + ES index: dept + + + + + + joined in DuckDB + + + rows to client + diff --git a/documentation/sql/diagrams/join-row2.svg b/documentation/sql/diagrams/join-row2.svg index 8c5ebf0c..0507ba82 100644 --- a/documentation/sql/diagrams/join-row2.svg +++ b/documentation/sql/diagrams/join-row2.svg @@ -1,4 +1,4 @@ - + @@ -8,27 +8,39 @@ - + - - - - + + + + + + Row 2 · cross-cluster conveyor + + INSERT INTO `prod_eu`.dest + SELECT o.id, c.name + FROM `prod_us`.orders o + JOIN `prod_us`.customers c ON o.id = c.id; - - - Federation coordinator + + + + + - - prod_us sidecar - source + + Federation coordinator - - prod_eu sidecar - target + + prod_us sidecar + source - - 1. run source SELECT - 2. Arrow stream - 3. bulk-load conveyor + + prod_eu sidecar + target + + 1. run source SELECT + 2. Arrow stream + 3. bulk-load conveyor + diff --git a/documentation/sql/diagrams/join-row3.svg b/documentation/sql/diagrams/join-row3.svg index 5124e732..4ce7e638 100644 --- a/documentation/sql/diagrams/join-row3.svg +++ b/documentation/sql/diagrams/join-row3.svg @@ -1,4 +1,4 @@ - + @@ -9,32 +9,44 @@ - + - - - - - + + + + + + Row 3 · multi-source coordinator + + INSERT INTO `prod_eu`.dest + SELECT o.id, c.name + FROM `prod_us`.orders o + JOIN `prod_fr`.customers c ON o.id = c.id; - - - prod_us - - prod_fr + + + + + + - - Federation coordinator - Parquet scratch + - per-query DuckDB view + + prod_us + + prod_fr - - coordinator-local JOIN + + Federation coordinator + Parquet scratch + + per-query DuckDB view - - prod_eu (target) + + coordinator-local JOIN - - leg 1 - leg 2 + + prod_eu (target) + + leg 1 + leg 2 +