From 0e2d8c6c2b166446493529df41ec26c9e4f7ad1b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 07:56:58 +0200 Subject: [PATCH 1/2] docs: add Known Limitations & Roadmap MD mirror (honest BI-tool gap framing) Story 17.6 (Epic 17, Layer 0). MD mirror of the canonical web /known-limitations/ page: exactly what SQL works in R1, what's coming in R2a/R2b/R3+, and the per-tool BI workaround (Tableau / Power BI / Looker / Metabase) for the subquery/CTE gap. Token-for-token parity with the web MDX. Every works/does-not-work claim verified against release-r1 source. - NEW documentation/sql/known_limitations.md - UPDATE documentation/sql/README.md: index link after DQL Support - Inbound one-liners on documentation/sql/dql_statements.md and documentation/client/arrow_flight_sql.md - Outbound links to joins.md (17.1) and ../client/federation_operator_guide.md (16.6/17.2) forward-declared until those targets land on release-r1 Closed Issue #139 Co-Authored-By: Claude Opus 4.8 (1M context) --- documentation/client/arrow_flight_sql.md | 6 ++ documentation/sql/README.md | 1 + documentation/sql/dql_statements.md | 2 + documentation/sql/known_limitations.md | 75 ++++++++++++++++++++++++ 4 files changed, 84 insertions(+) create mode 100644 documentation/sql/known_limitations.md diff --git a/documentation/client/arrow_flight_sql.md b/documentation/client/arrow_flight_sql.md index e23246d4..489176af 100644 --- a/documentation/client/arrow_flight_sql.md +++ b/documentation/client/arrow_flight_sql.md @@ -144,6 +144,12 @@ The Arrow Flight SQL sidecar sends one anonymous usage ping per day (no IP, no S --- +## Known limitations + +Subqueries, CTEs (`WITH`), and set operators beyond `UNION ALL` are not in R1 — and some BI tools auto-generate them. See [Known Limitations & Roadmap](../sql/known_limitations.md) for exactly what works today, what's coming in R2a, and the per-tool workaround. + +--- + ## License Arrow Flight SQL is licensed under the **Elastic License 2.0** — free to use, not open source. diff --git a/documentation/sql/README.md b/documentation/sql/README.md index 996339c6..a63e12dc 100644 --- a/documentation/sql/README.md +++ b/documentation/sql/README.md @@ -17,6 +17,7 @@ Welcome to the SQL Engine Documentation. Navigate through the sections below: - [DDL Support](ddl_statements.md) - [DML Support](dml_statements.md) - [DQL Support](dql_statements.md) +- [Known Limitations & Roadmap](known_limitations.md) - [Materialized Views](materialized_views.md) - [Telemetry & Privacy](../client/telemetry.md) - [Telemetry & Privacy](telemetry.md) diff --git a/documentation/sql/dql_statements.md b/documentation/sql/dql_statements.md index 9fd32d32..f4dbf115 100644 --- a/documentation/sql/dql_statements.md +++ b/documentation/sql/dql_statements.md @@ -852,6 +852,8 @@ Notes: ## Limitations +For the full picture of what works in R1, what's coming in R2a/R2b, and BI-tool workarounds, see [Known Limitations & Roadmap](known_limitations.md). + Even though the DQL engine is powerful, some SQL features are not (yet) supported: - Traditional SQL joins are supported only through the use of Materialized Views (only `JOIN UNNEST` on `ARRAY` is available natively) diff --git a/documentation/sql/known_limitations.md b/documentation/sql/known_limitations.md new file mode 100644 index 00000000..bc7b4a78 --- /dev/null +++ b/documentation/sql/known_limitations.md @@ -0,0 +1,75 @@ +[Back to index](README.md) + +# Known Limitations & Roadmap + +SoftClient4ES R1 runs a large, practical subset of ANSI SQL on Elasticsearch — including cross-index JOINs that Elasticsearch itself cannot do. A few advanced constructs (subqueries, CTEs, set operators beyond `UNION ALL`) are not in R1 yet. This page tells you exactly what works **as of R1**, what's coming, and how to get unblocked today. + +> Great for explicit JOIN SQL — full BI-tool subquery / CTE support is coming in R2a. + + + +## Using a BI tool? Read this first + +If your BI tool just failed on a subquery or a CTE, you're in the right place. Some BI tools auto-generate nested SQL (subqueries / derived tables) even when your logical query has none. Until R2a lands full subquery support, send **explicit JOIN SQL** instead of letting the tool compose nested queries: + +- **Tableau** — Tableau **live** connections can auto-generate subqueries. Use **Extract** mode (Tableau runs the extract locally, subquery-free), or write **Custom SQL** with explicit JOINs instead of letting Tableau compose the query. +- **Power BI** — DirectQuery / query folding can compose nested SQL. Prefer **Import** mode (folds locally, nothing nested is pushed), or author explicit-JOIN queries; avoid relationships that force generated subqueries until R2a. +- **Looker** — BI tools that build **derived tables / measures** (e.g. Looker) can compose subqueries. Model **explicit JOINs** in the SQL the tool sends rather than relying on tool-composed derived tables; avoid symmetric-aggregate measures that force derived tables until R2a. +- **Metabase** — the GUI Question builder can emit subqueries for multi-stage questions. Use **Native (SQL)** queries with explicit `JOIN … ON …` instead of the visual builder for any query that would otherwise nest. + +> **General rule:** prefer **explicit JOIN SQL** over tool-generated nested SQL. If you control the query, a cross-index JOIN is fully supported in R1. + +Tableau, Power BI, and Metabase are **Compatible** (work via the JDBC/ADBC spec, not formally tested by us). **Apache Superset** (dedicated dialect), **DBeaver**, and **Grafana** (via Arrow Flight SQL) are **Tested**. + +## Works in R1 + +- **Cross-index JOINs**: `INNER` / `LEFT` / `RIGHT` / `FULL` / `CROSS`, plus `JOIN UNNEST` on nested arrays — something Elasticsearch cannot do natively. (See the JOIN matrix walkthrough for the per-tier rows and worked examples.) +- **Aggregations** + `GROUP BY` / `HAVING`. +- **Analytical SQL**: `ROW_NUMBER` / `RANK` / `DENSE_RANK`; the `STDDEV` / `VARIANCE` family (`STDDEV_POP`, `STDDEV_SAMP`, `VAR_POP`, `VAR_SAMP`); `PERCENTILE_CONT` / `PERCENTILE_DISC`; window aggregates and `FIRST_VALUE` / `LAST_VALUE` / `ARRAY_AGG` over `OVER (PARTITION BY …)`. +- **Conditionals & null handling**: `CASE` / `COALESCE` / `NULLIF` / `GREATEST` / `LEAST` / `ISNULL` / `ISNOTNULL`. +- `ORDER BY … NULLS FIRST | NULLS LAST`. +- `UNION ALL` (concatenate result sets — no de-duplication). +- `SELECT * EXCEPT(col, …)` — drop named columns from `SELECT *`. This is the BigQuery-style **column-exclusion** clause. It is **not** the `EXCEPT` set operator (see below). + +## Not in R1 (coming in R2a, ~5–6 months out) + +- **Subqueries**: scalar, `IN (SELECT …)`, `EXISTS (SELECT …)`, derived tables `FROM (SELECT …)`, and correlated subqueries. +- **CTEs**: `WITH name AS (SELECT …)` — recursive and non-recursive. +- **Set operators**: `UNION` (with row de-duplication), `INTERSECT`, and the `EXCEPT` **set operator**. The `EXCEPT` set operator is **distinct from** the `SELECT * EXCEPT(cols)` column-exclusion clause above — that one works; the set operator does not. +- **Positional / tiling window functions**: `NTILE`, `LAG`, `LEAD` — not yet implemented; coming with the R2a analytical-SQL work. (Note: `PERCENTILE_CONT` / `PERCENTILE_DISC` — percentile *aggregates* — already work in R1; the positional/tiling window functions are a different family.) + +These arrive in R2a via the new **arrow-bi** module — single-cluster customers get them by upgrading the driver (JDBC / ADBC / sidecar), with no infrastructure change and no federation server required. + +### What a not-yet-supported query looks like + +A subquery in a `WHERE` clause is rejected by the parser today: + +```sql +-- Not supported in R1: subqueries are not yet implemented. +SELECT name +FROM employees +WHERE department_id IN (SELECT id FROM departments WHERE region = 'EU'); +``` + +The parser rejects this — `IN` accepts only literal value lists in R1, not a nested `SELECT`. Rewrite it as an explicit JOIN (fully supported), or wait for R2a where the subquery form lands as-is. + +## Coming in R2b (~5–6 months after R2a) + +- **Heterogeneous federation**: JOIN or correlate Elasticsearch with PostgreSQL, MySQL, ClickHouse, Snowflake, and more — plus cross-cluster subqueries (e.g. correlate one cluster's data against another's). + +## Deferred (R3+, demand-driven — tell us what you need) + +- `MERGE`, `RETURNING`, `INFORMATION_SCHEMA`, non-materialized `CREATE VIEW`, `DECIMAL`, `TIMESTAMP WITH TIME ZONE`, `INTERVAL` as a type, and `UUID`. No committed date — these are prioritised by customer demand. (R1 DML already supports `INSERT … ON CONFLICT` upsert — a different feature from `MERGE`.) + +## Roadmap timing + +We do not commit external dates. R2a is roughly **5–6 months** from R1; R2b roughly **5–6 months** after R2a; R3+ is demand-driven. Treat the R2a feature list as *planned*, not guaranteed — its scope is gated on a function-library audit. + +## See also + +- The JOIN matrix walkthrough — how the three JOIN tiers work, with worked examples. +- The federation operator guide — multi-cluster federation deployment. + +--- + +*This page describes SoftClient4ES **as of R1**. Once R2a ships, the "Not in R1" list above shrinks — verify against your installed release.* From 2200dfee77e6fbae73d41d30f0cd1519923c4354 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Manciot?= Date: Thu, 25 Jun 2026 14:21:01 +0200 Subject: [PATCH 2/2] =?UTF-8?q?docs(17.6):=20rename=20R1/R2x=E2=86=92relea?= =?UTF-8?q?se+quarter=20wording,=20remove=20story=20refs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- documentation/sql/known_limitations.md | 42 ++++++++++++-------------- 1 file changed, 20 insertions(+), 22 deletions(-) diff --git a/documentation/sql/known_limitations.md b/documentation/sql/known_limitations.md index bc7b4a78..9dbb6a3d 100644 --- a/documentation/sql/known_limitations.md +++ b/documentation/sql/known_limitations.md @@ -2,28 +2,26 @@ # Known Limitations & Roadmap -SoftClient4ES R1 runs a large, practical subset of ANSI SQL on Elasticsearch — including cross-index JOINs that Elasticsearch itself cannot do. A few advanced constructs (subqueries, CTEs, set operators beyond `UNION ALL`) are not in R1 yet. This page tells you exactly what works **as of R1**, what's coming, and how to get unblocked today. +SoftClient4ES runs a large, practical subset of ANSI SQL on Elasticsearch — including cross-index JOINs that Elasticsearch itself cannot do. A few advanced constructs (subqueries, CTEs, set operators beyond `UNION ALL`) are not in the current release yet. This page tells you exactly what works **as of this release**, what's coming, and how to get unblocked today. -> Great for explicit JOIN SQL — full BI-tool subquery / CTE support is coming in R2a. - - +> Great for explicit JOIN SQL — full BI-tool subquery / CTE support is coming in the next release. ## Using a BI tool? Read this first -If your BI tool just failed on a subquery or a CTE, you're in the right place. Some BI tools auto-generate nested SQL (subqueries / derived tables) even when your logical query has none. Until R2a lands full subquery support, send **explicit JOIN SQL** instead of letting the tool compose nested queries: +If your BI tool just failed on a subquery or a CTE, you're in the right place. Some BI tools auto-generate nested SQL (subqueries / derived tables) even when your logical query has none. Until the next release lands full subquery support, send **explicit JOIN SQL** instead of letting the tool compose nested queries: - **Tableau** — Tableau **live** connections can auto-generate subqueries. Use **Extract** mode (Tableau runs the extract locally, subquery-free), or write **Custom SQL** with explicit JOINs instead of letting Tableau compose the query. -- **Power BI** — DirectQuery / query folding can compose nested SQL. Prefer **Import** mode (folds locally, nothing nested is pushed), or author explicit-JOIN queries; avoid relationships that force generated subqueries until R2a. -- **Looker** — BI tools that build **derived tables / measures** (e.g. Looker) can compose subqueries. Model **explicit JOINs** in the SQL the tool sends rather than relying on tool-composed derived tables; avoid symmetric-aggregate measures that force derived tables until R2a. +- **Power BI** — DirectQuery / query folding can compose nested SQL. Prefer **Import** mode (folds locally, nothing nested is pushed), or author explicit-JOIN queries; avoid relationships that force generated subqueries until the next release. +- **Looker** — BI tools that build **derived tables / measures** (e.g. Looker) can compose subqueries. Model **explicit JOINs** in the SQL the tool sends rather than relying on tool-composed derived tables; avoid symmetric-aggregate measures that force derived tables until the next release. - **Metabase** — the GUI Question builder can emit subqueries for multi-stage questions. Use **Native (SQL)** queries with explicit `JOIN … ON …` instead of the visual builder for any query that would otherwise nest. -> **General rule:** prefer **explicit JOIN SQL** over tool-generated nested SQL. If you control the query, a cross-index JOIN is fully supported in R1. +> **General rule:** prefer **explicit JOIN SQL** over tool-generated nested SQL. If you control the query, a cross-index JOIN is fully supported in the current release. Tableau, Power BI, and Metabase are **Compatible** (work via the JDBC/ADBC spec, not formally tested by us). **Apache Superset** (dedicated dialect), **DBeaver**, and **Grafana** (via Arrow Flight SQL) are **Tested**. -## Works in R1 +## Works in this release -- **Cross-index JOINs**: `INNER` / `LEFT` / `RIGHT` / `FULL` / `CROSS`, plus `JOIN UNNEST` on nested arrays — something Elasticsearch cannot do natively. (See the JOIN matrix walkthrough for the per-tier rows and worked examples.) +- **Cross-index JOINs**: `INNER` / `LEFT` / `RIGHT` / `FULL` / `CROSS`, plus `JOIN UNNEST` on nested arrays — something Elasticsearch cannot do natively. (See the [JOIN matrix walkthrough](joins.md) for the per-tier rows and worked examples.) - **Aggregations** + `GROUP BY` / `HAVING`. - **Analytical SQL**: `ROW_NUMBER` / `RANK` / `DENSE_RANK`; the `STDDEV` / `VARIANCE` family (`STDDEV_POP`, `STDDEV_SAMP`, `VAR_POP`, `VAR_SAMP`); `PERCENTILE_CONT` / `PERCENTILE_DISC`; window aggregates and `FIRST_VALUE` / `LAST_VALUE` / `ARRAY_AGG` over `OVER (PARTITION BY …)`. - **Conditionals & null handling**: `CASE` / `COALESCE` / `NULLIF` / `GREATEST` / `LEAST` / `ISNULL` / `ISNOTNULL`. @@ -31,45 +29,45 @@ Tableau, Power BI, and Metabase are **Compatible** (work via the JDBC/ADBC spec, - `UNION ALL` (concatenate result sets — no de-duplication). - `SELECT * EXCEPT(col, …)` — drop named columns from `SELECT *`. This is the BigQuery-style **column-exclusion** clause. It is **not** the `EXCEPT` set operator (see below). -## Not in R1 (coming in R2a, ~5–6 months out) +## Not in this release (coming in the next release, Quarter 4 2026) - **Subqueries**: scalar, `IN (SELECT …)`, `EXISTS (SELECT …)`, derived tables `FROM (SELECT …)`, and correlated subqueries. - **CTEs**: `WITH name AS (SELECT …)` — recursive and non-recursive. - **Set operators**: `UNION` (with row de-duplication), `INTERSECT`, and the `EXCEPT` **set operator**. The `EXCEPT` set operator is **distinct from** the `SELECT * EXCEPT(cols)` column-exclusion clause above — that one works; the set operator does not. -- **Positional / tiling window functions**: `NTILE`, `LAG`, `LEAD` — not yet implemented; coming with the R2a analytical-SQL work. (Note: `PERCENTILE_CONT` / `PERCENTILE_DISC` — percentile *aggregates* — already work in R1; the positional/tiling window functions are a different family.) +- **Positional / tiling window functions**: `NTILE`, `LAG`, `LEAD` — not yet implemented; coming with the next release's analytical-SQL work. (Note: `PERCENTILE_CONT` / `PERCENTILE_DISC` — percentile *aggregates* — already work in the current release; the positional/tiling window functions are a different family.) -These arrive in R2a via the new **arrow-bi** module — single-cluster customers get them by upgrading the driver (JDBC / ADBC / sidecar), with no infrastructure change and no federation server required. +These arrive in the next release as a driver-side enhancement — single-cluster customers get them by upgrading the driver (JDBC / ADBC / sidecar), with no infrastructure change and no federation server required. ### What a not-yet-supported query looks like A subquery in a `WHERE` clause is rejected by the parser today: ```sql --- Not supported in R1: subqueries are not yet implemented. +-- Not supported in the current release: subqueries are not yet implemented. SELECT name FROM employees WHERE department_id IN (SELECT id FROM departments WHERE region = 'EU'); ``` -The parser rejects this — `IN` accepts only literal value lists in R1, not a nested `SELECT`. Rewrite it as an explicit JOIN (fully supported), or wait for R2a where the subquery form lands as-is. +The parser rejects this — `IN` accepts only literal value lists today, not a nested `SELECT`. Rewrite it as an explicit JOIN (fully supported), or wait for the next release where the subquery form lands as-is. -## Coming in R2b (~5–6 months after R2a) +## Coming in the upcoming release (Quarter 1 2027) - **Heterogeneous federation**: JOIN or correlate Elasticsearch with PostgreSQL, MySQL, ClickHouse, Snowflake, and more — plus cross-cluster subqueries (e.g. correlate one cluster's data against another's). -## Deferred (R3+, demand-driven — tell us what you need) +## Deferred (a future release, demand-driven — tell us what you need) -- `MERGE`, `RETURNING`, `INFORMATION_SCHEMA`, non-materialized `CREATE VIEW`, `DECIMAL`, `TIMESTAMP WITH TIME ZONE`, `INTERVAL` as a type, and `UUID`. No committed date — these are prioritised by customer demand. (R1 DML already supports `INSERT … ON CONFLICT` upsert — a different feature from `MERGE`.) +- `MERGE`, `RETURNING`, `INFORMATION_SCHEMA`, non-materialized `CREATE VIEW`, `DECIMAL`, `TIMESTAMP WITH TIME ZONE`, `INTERVAL` as a type, and `UUID`. No committed date — these are prioritised by customer demand. (Current-release DML already supports `INSERT … ON CONFLICT` upsert — a different feature from `MERGE`.) ## Roadmap timing -We do not commit external dates. R2a is roughly **5–6 months** from R1; R2b roughly **5–6 months** after R2a; R3+ is demand-driven. Treat the R2a feature list as *planned*, not guaranteed — its scope is gated on a function-library audit. +We do not commit firm external dates. The next release is targeted for **Quarter 4 2026**; the upcoming release (heterogeneous federation) for **Quarter 1 2027**; the deferred items are demand-driven with no committed date. Treat the next release's feature list as *planned*, not guaranteed — its scope is gated on a function-library audit. ## See also -- The JOIN matrix walkthrough — how the three JOIN tiers work, with worked examples. -- The federation operator guide — multi-cluster federation deployment. +- The [JOIN matrix walkthrough](joins.md) — how the three JOIN tiers work, with worked examples. +- The [federation operator guide](../client/federation_operator_guide.md) — multi-cluster federation deployment. --- -*This page describes SoftClient4ES **as of R1**. Once R2a ships, the "Not in R1" list above shrinks — verify against your installed release.* +*This page describes SoftClient4ES **as of the current release**. Once the next release ships, the "Not in this release" list above shrinks — verify against your installed release.*