Skip to content

test(parity): first parity harness — port MobilityDB 001_set.test.sql#6

Merged
nhungoc1508 merged 3 commits into
mainfrom
feat/parity-set
Apr 24, 2026
Merged

test(parity): first parity harness — port MobilityDB 001_set.test.sql#6
nhungoc1508 merged 3 commits into
mainfrom
feat/parity-set

Conversation

@estebanzimanyi

Copy link
Copy Markdown
Member

Summary

First parity harness between MobilityDB (PostgreSQL extension, historical store) and MobilityDuck (DuckDB extension, pipeline side). Adds test/sql/parity/001_set.test — a 1:1 port of the upstream regression file mobilitydb/test/temporal/queries/001_set.test.sql into DuckDB sqllogic format.

Goal: the long-term objective is that the same SQL query runs on both projects and produces comparable results. This file is the first executable, machine-checkable artifact that makes the gap measurable. Every future binding PR can annotate the skip block it unblocks.

Methodology

  • Active assertion per query that runs on MobilityDuck today. Expected output reflects MobilityDuck's format (ISO dates, UTC timestamps under TZ=UTC) — parity is in the query surface, not the display layer.
  • mode skip block per query that hits an unbound MEOS symbol, a naming divergence, or an output-format divergence, with a # gap: annotation naming the specific missing binding.
  • No C++ touched. No user-visible feature shipped. This is the measuring stick subsequent PRs chip away at.

Initial gap triage

Four tracked gaps (based on src-tree audit — CI may reveal additional divergences on the currently-active assertions):

# Area Gap Fix
1 Constructor set(ARRAY[GEOMETRY ...]) not registered in src/geo/geoset.cpp Register ScalarFunction("set", {LIST(GEOMETRY)}, geomset(), ...) (MEOS: geomset_make)
2 Accessor spans(<set_type>) missing — only spans(<spanset_type>) exists (src/temporal/spanset.cpp:330) Add set-level spans (MEOS: set_spans)
3 Naming MobilityDB uses splitNspans / splitEachNspans (lowercase s); MobilityDuck registers splitNSpans / splitEachNSpans in src/temporal/span.cpp:313-340 Add lowercase aliases
4 Accessor set_hash / set_hash_extended not registered in src/temporal/set.cpp Add bindings returning BIGINT (MEOS: set_hash, set_hash_extended)

Test plan

  • CI runs test/sql/parity/001_set.test and reports pass/fail per active assertion.
  • Any active assertion that fails in CI gets re-triaged (skip block + gap note) in a follow-up commit on this branch before merge.
  • Each of the 4 tracked gaps becomes a follow-up PR that deletes the matching skip block.
  • Once this lands, the next parity PR ports 003_span.test.sql using the same shape.

Notes

  • Local build not feasible in the authoring environment (vcpkg not installed); CI is the validation source of truth for the initial pass/fail tally.
  • The source MobilityDB PR that motivated this scope (#788 "Add the tbigint type" upstream) is still open; once it lands, bigintset queries for the set of MobilityDB queries that PR adds will be easy follow-ups under the same parity structure.

🤖 Generated with Claude Code

estebanzimanyi and others added 3 commits April 24, 2026 21:01
Adds test/sql/parity/001_set.test — a 1:1 port of the upstream MobilityDB
regression file mobilitydb/test/temporal/queries/001_set.test.sql into
DuckDB sqllogic format. Goal is executable, machine-checkable parity
between MobilityDB (PostgreSQL extension) and MobilityDuck (DuckDB
extension) for the traditional temporal-set types — so the same query
runs on both the historical store and the pipeline side.

Every query from the source file is mirrored in order. Queries that run
on MobilityDuck today are active assertions. Queries that hit an unbound
MEOS symbol, a naming divergence, or an output-format divergence are
wrapped in `mode skip` with a `# gap:` annotation that names the
specific missing binding. Each skip is a tracked backlog item for a
follow-up binding PR.

Tracked gaps (initial triage, based on src-tree audit — CI may reveal
more divergences on the active assertions):

- set(ARRAY[GEOMETRY ...]) — geomset list-constructor not registered in
  src/geo/geoset.cpp (MEOS: geomset_make).
- spans(<set_type>) — only spans(<spanset_type>) is registered in
  src/temporal/spanset.cpp:330; the set-level form is missing.
- splitNspans / splitEachNspans — naming divergence. MobilityDuck
  registers splitNSpans / splitEachNSpans (camelCase, capital S) in
  src/temporal/span.cpp:313-340; the MobilityDB SQL form uses lowercase
  "spans".
- set_hash / set_hash_extended — not registered in
  src/temporal/set.cpp (MEOS: set_hash, set_hash_extended).

Output-format note: MobilityDB emits PG-locale output (MM-DD-YYYY
dates, "Sat Jan 01 00:00:00 2000 PST" timestamps); MobilityDuck emits
DuckDB-shaped output (ISO dates, UTC timestamps under TZ=UTC, as the
Makefile sets). Expected blocks in this file reflect MobilityDuck's
output — parity is in the query surface, not the display layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI parser requires every `statement error` directive to carry a `----`
separator followed by an expected error-message substring (even empty is
rejected). Port was missing both on all 9 error assertions; adds MEOS
error substrings lifted from the upstream MobilityDB expected-output
file so they match what MEOS raises through the MobilityDuck wrapper:

- Could not parse (tstzset parse errors)
- cannot be negative (asText negative-digits)
- cannot be empty (set of empty array)
- mixed 2D/3D, non-empty, mixed SRID (geomset constructor errors — in skip block)
- strictly positive (splitNspans / splitEachNspans negative count — in skip block)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous CI run died after the first `statement error` query — MEOS's
default error handler terminates the process on MEOS_ERROR instead of
surfacing as a DuckDB exception, so the parse-error test killed the
whole test runner before any assertion could be checked.

Root cause (MobilityDuck bug, not a parity-file bug):
`src/mobilityduck_extension.cpp:143` calls `meos_initialize()` but not
`meos_initialize_error_handler(...)`. Follow-up PR: install a MEOS
error handler in LoadInternal() that maps MEOS_ERROR to a thrown
DuckDB::InvalidInputException — at which point every `statement error`
block in this file becomes a real parity assertion in a single edit.

For now, wrap all 4 previously-active `statement error` blocks (tstzset
parse errors, asText negative digits, empty-array `set(...)` ) in
`mode skip` with cross-references to a new top-of-file error-handler
note explaining the gap and the one-stroke fix once it lands. The 5
error assertions already inside geomset / splitN skip blocks get the
same cross-reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nhungoc1508 nhungoc1508 merged commit cb730e6 into main Apr 24, 2026
9 checks passed
nhungoc1508 pushed a commit that referenced this pull request Apr 25, 2026
MEOS's default error handler writes the message to stderr and calls
exit(EXIT_FAILURE) on errlevel == ERROR (PG elog level 21 carried
through via postgres.h). That tears down the whole DuckDB process on
any invalid user input — a parse error on a malformed tstzset literal,
a negative precision on asText, etc. — which (a) is user-hostile in
production and (b) makes every `statement error` test kill the test
runner before Catch2 can check it. PR #6 surfaced this by trying to
exercise MEOS error paths in the parity harness and crashing CI on the
first parse-error query.

This PR installs `MobilityduckMeosErrorHandler` via
`meos_initialize_error_handler(...)` in LoadInternal, right after
`meos_initialize()`. The handler throws a DuckDB::InvalidInputException
on errlevel >= 21 (ERROR/FATAL/PANIC), so MEOS errors surface as
ordinary query failures with the MEOS message attached. Lower levels
(WARNING/NOTICE/INFO) are silently ignored for now — can be wired to
DuckDB's logger in a follow-up if useful.

Parity file (`test/sql/parity/001_set.test`) consequences:

- Deletes the top-of-file error-handler gap note (now obsolete).
- Unwraps the 3 MEOS-error-specific `mode skip` blocks covering tstzset
  parse errors and asText negative-digits — they run as real
  `statement error` assertions.
- Re-wraps ONE of those (`set('{}'::timestamptz[])`) in a skip block
  with a new gap note: that query fails at the DuckDB cast layer with
  a conversion error before MEOS sees it, because `'{}'` is PG
  array-literal syntax that DuckDB rejects as a string literal. The
  equivalent DuckDB form `set(ARRAY[]::TIMESTAMPTZ[])` reaches MEOS
  but the constructor wrapper in src/temporal/set_functions.cpp
  returns null instead of calling meos_error(...) on empty input —
  a separate gap, documented for a later follow-up.

Verified locally (`TZ=UTC ./build/release/test/unittest
"/path/to/test/*"`): 747 assertions pass across 13 test cases (up from
739 pre-handler), no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants