Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/integration-train-validate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Integration train validate

# Validates the integration-train manifest (`meta/integration-train.json`)
# is a well-formed instance of its declared schema, and that `verify-train.sh`
# stays syntactically valid (so the wave-by-wave verifier never silently
# breaks).
#
# Mirrors the OpenAPI-validate workflow pattern from PR #5 — fail fast on
# any structural drift to the parity-proof artifact.

on:
pull_request:
paths:
- 'meta/integration-train.json'
- 'docs/integration-train.md'
- 'verify-train.sh'
- '.github/workflows/integration-train-validate.yml'
push:
branches: [master]
workflow_dispatch:

jobs:
validate-manifest:
name: Validate integration-train manifest + verifier
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Validate manifest is valid JSON
run: |
python3 -c "
import json, sys
d = json.load(open('meta/integration-train.json'))
assert 'schema' in d, 'missing schema field'
assert d['schema'].startswith('integration-train/'), \
f\"unexpected schema: {d['schema']}\"
assert 'waves' in d, 'missing waves field'
assert isinstance(d['waves'], list) and len(d['waves']) > 0, \
'waves must be a non-empty list'
for w in d['waves']:
assert 'id' in w, f'wave missing id: {w}'
assert 'name' in w, f\"wave {w['id']} missing name\"
assert 'status' in w, f\"wave {w['id']} missing status\"
assert w['status'] in {
'PROVEN', 'IN_FLIGHT', 'BLOCKING_ALL',
'GREEN_IN_ISOLATION', 'ANCHOR_DEFINED',
}, f\"wave {w['id']} bad status: {w['status']}\"
ids = [w['id'] for w in d['waves']]
assert ids == sorted(ids), f'wave ids not sorted: {ids}'
print(f'OK schema={d[\"schema\"]} waves={len(d[\"waves\"])} ids={ids}')
"

- name: Validate verify-train.sh is syntactically valid
run: bash -n verify-train.sh

- name: Validate verify-train.sh is executable
run: |
if [ ! -x verify-train.sh ]; then
echo "::error::verify-train.sh must be executable"
exit 1
fi
37 changes: 35 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@ This catalog is the foundation for generating language bindings (Python, Java, R
- [Getting started](#getting-started)
- [Output format](#output-format)
- [Adding metadata](#adding-metadata)
- [The object model](#the-object-model)

## How it works

The pipeline runs in two steps:
The pipeline runs in three steps:

1. **Parser** — scans the MEOS `.h` header files using libclang and extracts every function signature, struct, and enum into structured JSON.
2. **Merger** — enriches the parser output with manual annotations from `meta/meos-meta.json`, such as documentation and memory ownership rules.
3. **Object model** — makes the *implicit* MEOS class hierarchy explicit: it derives the class lattice and assigns every function to the class it is a method of, from the canonical mapping in `meta/object-model.json`. See [The object model](#the-object-model).

## Getting started

Expand Down Expand Up @@ -51,14 +53,23 @@ python setup.py --branch v1.2.0
python run.py
```

The result is written to `output/meos-api.json`.
The result is written to `output/meos-idl.json`.

You can also point the tool at a different headers directory:

```bash
python run.py /path/to/custom/include
```

The object-model step also derives the per-function error contract by
scanning the MobilityDB C sources (`_mobilitydb/meos/src`, fetched by
`setup.py`). To audit the derived lattice against the most mature
hand-built model (PyMEOS):

```bash
python object_model_parity.py # -> output/meos-object-model-parity.json
```

## Output format

`meos-api.json` contains 3 top-level arrays: `functions`, `structs`, and `enums`.
Expand All @@ -80,6 +91,28 @@ A typical function entry looks like this:
}
```

In addition, `meos-idl.json` carries an `objectModel` block: the explicit
class lattice (`classes`, `lattice`), the reverse index assigning each
function to the class it is a method of (`functionToClass`), the
closed-algebra companion hierarchies (`companions`), the error contract
(`errors`), and the irregularity worklist (`corrections`).

## Adding metadata

Manual annotations (ownership rules, additional documentation, deprecation flags, etc.) live in `meta/meos-meta.json`. The merger applies them on top of the libclang-parsed structure when generating the final catalog.

## The object model

MEOS is C — it has no classes. The object model is encoded by convention
in the `Temporal`/`TInstant`/`TSequence`/`TSequenceSet` struct family (the
template axis), the `temptype` discriminator whose base type is the
missing template parameter (the type-family axis), and the function-name
prefixes that bind a function to the class it is a method of
(`temporal_*` = the late-bound superclass; `tnumber_*`/`tspatial_*`/
`tpoint_*`/`tgeo_*` = abstract families; `tbool_*`/`tint_*`/… = exact
types). `meta/object-model.json` makes that lattice explicit so every
binding/engine derives the **same** classes and methods from one mapping.

See [docs/object-model.md](docs/object-model.md) for the full
specification, the closed-algebra companion hierarchies, the error
contract, the parity audit, and the irregularity worklist.
84 changes: 84 additions & 0 deletions docs/integration-train.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Integration train — making ecosystem-wide 100% parity provable

## Why this exists

The MobilityDB ecosystem (MobilityDB · MEOS-API · PyMEOS-CFFI · PyMEOS ·
MobilityDuck · MobilitySpark · MobilityAPI · JMEOS · GoMEOS · MEOS.NET ·
MEOS.js · MobilityDB-BerlinMOD · the stream-side platforms MobilityFlink
· MobilityKafka · MobilityNebula) carries a fan of individually-correct
**open** PRs. Each is green *in isolation*, but:

- every PR's CI builds against a `master` that lacks the *other* PRs'
content (PyMEOS CI builds MEOS from MobilityDB master → lacks the
extended-type C surface; PyMEOS code is broken vs MEOS master — the
rename skew);
- the maintainer is the only merge gate — no automated merge;
- per-PR independence means **nobody assembles and verifies the
integrated whole**.

So "100% parity" is true per-branch yet **unprovable as a system**. This
train operationalizes MobilityDB discussion **#895 (wave-based merge
plan)**: a dependency-ordered manifest + a one-command verifier so parity
is demonstrated *at one point*, and the maintainer gets an ordered,
de-risked merge sequence instead of N cross-dependent PRs reading red.

## Artifacts

- [`meta/integration-train.json`](../meta/integration-train.json) — the
PR dependency DAG, per-wave compose recipe, gates, owners, merge order.
- [`verify-train.sh`](../verify-train.sh) — composes the train and runs
each wave's gate. Honesty contract: a wave is `PASS` only when its gate
is just-run green here; otherwise `BLOCKED:<reason>` with the exact
gate it needs. Nothing faked or silently skipped.

## The waves

| Wave | Content | Status |
|---|---|---|
| **0** | MobilityDB extended-type C surface (stack #1081→#1085, then #1051→#951) | **PROVEN** — 2699 fns, MEOS-API PR #10 21/21, `from_mfjson`/ctors uniform |
| **1** | PyMEOS-CFFI MEOS-1.4 substrate (regenerate vs Wave-0) | IN_FLIGHT (PyMEOS-CFFI #18, #19) |
| **2** | **CRITICAL PATH** — PyMEOS MEOS-1.4 bump (#81/#82): kills the rename skew | BLOCKING_ALL |
| **3** | PyMEOS features: #85, #87, #88, #89→#90→#91, #84 (+ MobilityDuck #146/#147 for #84 interop) | GREEN_IN_ISOLATION, gated on Wave 2 |
| **4** | Downstream bindings (MobilityDuck 47 / MobilitySpark 10 / MobilityAPI 6 / JMEOS 6 / GoMEOS 4 / MEOS.NET 3 open PRs) | GREEN_IN_ISOLATION; JMEOS is the lone repo under structural-migration pressure (5/6 CONFL) |
| **5** | Service-agent + data-lake + stream layers (built on Wave-4 anchor) | ANCHOR_DEFINED (MEOS-API #4-7, #12-13; PyMEOS #84 + MobilityDuck #146/#147/#158; stream-layer planned-band) |

**Wave 2 is the single universal unblock.** Every PyMEOS parity claim is
downstream of it; nothing else accelerates 100% parity more. Build the
bump against the composed Wave 0 (not bare master) so it is done once,
correctly, against the final C surface.

**Waves 4 and 5** consume Wave 0's MEOS-1.4 C surface via the MEOS-API
`meos-idl.json` catalog. Each Wave-4 binding is bump-independent within
its own repo; the cross-binding gate is that the regenerated
`meos-idl.json` is byte-identical across them (proves single SoT).

## Branch base

This branch is **stacked on `feat/object-model` (MEOS-API PR #10)** — the
Wave-0 gate asserts the object-model classification (`from_mfjson` →
TCbuffer/TNpoint/TPose, concrete `*inst_make` constructors), which is
PR #10's pipeline. PR #10 (object model) + PR #8 (portable-aliases SoT)
are the catalog anchor of the train; see
`meta/integration-train.json#/catalog_anchor`.

## Running it

```bash
python3 setup.py # one-time: fetch the MobilityDB sources
./verify-train.sh # Wave 0 fully verified here; Waves 1-3 report
# honest BLOCKED + the exact gate each needs
PYMEOS_ENV=<bump-ready PyMEOS clone> ./verify-train.sh # post-Wave-2
```

## Current status

Wave 0 is **proven**. Waves 1–3 are entirely gated on Wave 2 (the
MEOS-1.4 bump). Wave 4 (downstream bindings) is green-in-isolation
across MobilityDuck / MobilitySpark / MobilityAPI / GoMEOS / MEOS.NET;
JMEOS (5/6 CONFL) is under structural-migration pressure post the
multi-module restructure. Wave 5 (service-agent + data-lake + stream
layers) is anchor-defined and downstream of Wave 4. There is **no
remaining correctness work** — every deliverable is verified correct
in isolation; the gap is purely integration ordering plus the
maintainer-only merge gate, which this train reduces to: *merge in
wave order; CI turns green at each wave.*
Loading
Loading