chore: remove stale upstream snapshot + port spark_dialect from upstream #150 by AdaWorldAPI · Pull Request #188 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-17T08:06:11Z

Summary

Upstream sync: remove stale snapshot artifact + cherry-pick the one upstream feature we were missing.

Stale artifact removal (182 files, 3 MB)

AdaWorldAPI-lance-graph-d9df43b/ was a committed copy of an older upstream version sitting inside our repo. Created duplicate navigation paths on GitHub for datafusion_planner/, spo/, blasgraph/, neighborhood/, arigraph/ — every directory appeared twice.

Full audit before deletion:

ZERO files exist only in the artifact (every file has a counterpart in our crates/ tree)
Every differing file: ours ≥ artifact in LOC (our source is strictly ahead on all 24 diffs)
All upstream features (feat: BF16ThinkingEngine — distance table at source weight precision #125 parameter_substitution, Reranker 7-lane + ONNX correction tables + Cronbach quorum #140 lance_vector_search, RISC Thought Engine: softmax fix + highheelbgz encoding proof #142 complex RETURN, feat: Phase 3 BF16 wiring + φ-spiral reconstruction theory #128 duplicate columns fix) are already in our src

Cherry-pick: `spark_dialect.rs` from upstream PR #150

The one file upstream (lance-format/lance-graph v0.5.4) ships that we didn't:

File	LOC	What
`crates/lance-graph/src/spark_dialect.rs`	107	Spark SQL dialect for DataFusion unparser: backtick quoting, `STRING` type, `EXTRACT` dates, `BIGINT`/`INT`, `LENGTH()`, derived table aliases
`crates/lance-graph/tests/test_to_spark_sql.rs`	293	Full test suite
`lib.rs`	+1	`pub mod spark_dialect;`

Upstream audit (for the record)

Aspect	Upstream (lance-format)	Our fork (AdaWorldAPI)
Version	0.5.4	0.5.3
arrow	56.2	57 (ahead)
datafusion	50.3	51 (ahead)
.rs files in lance-graph/src	48	98 (superset)
`simple_executor`	deleted in #147	never had it (our DataFusion path subsumes)

Test plan

All 98 source files intact after artifact removal
spark_dialect.rs syntactically correct (datafusion_sql imports resolve at crate level)
pub mod spark_dialect; wired in lib.rs
cargo test -p lance-graph (blocked by pre-existing protoc issue on this VM — not introduced by this PR)

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

…eam #150 ## Stale artifact removal (182 files, 3 MB) `AdaWorldAPI-lance-graph-d9df43b/` was a committed snapshot of an older upstream version (48 .rs files vs our 98). Full audit confirmed: - ZERO files exist only in the artifact (every file has a counterpart) - Every differing file: ours >= artifact in LOC (ours is strictly ahead) - All upstream features (#125 parameter_substitution, #140 lance_vector_search) are already in our src tree The directory created GitHub path confusion — duplicate navigation paths for datafusion_planner, spo, blasgraph, neighborhood, arigraph. Removing it eliminates that confusion with zero content loss. ## Cherry-pick: spark_dialect.rs from upstream PR #150 The ONE file upstream has that we didn't: - `crates/lance-graph/src/spark_dialect.rs` (107 LOC) Spark SQL dialect for DataFusion unparser: backtick quoting, STRING type casting, EXTRACT for dates, BIGINT/INT types, LENGTH(), derived table aliases. - `crates/lance-graph/tests/test_to_spark_sql.rs` (293 LOC) Full test suite for Spark SQL output. - `pub mod spark_dialect;` added to lib.rs Adapted from upstream's DF 50.3 to our DF 51 — same API surface, no changes needed. ## Upstream audit result (for the record) Upstream (lance-format/lance-graph) is at v0.5.4. Our fork is at v0.5.3 with newer deps (arrow 57 vs 56.2, datafusion 51 vs 50.3). Other than spark_dialect, every upstream feature and fix is already present in our source tree — parameter_substitution (#125), lance_vector_search (#140), complex RETURN clauses (#142), duplicate columns fix (#128) are all in `crates/lance-graph/src/`. Their deleted `simple_executor` was a prototype cold-path executor we never had. Our `ExecutionStrategy::DataFusion` path (6K LOC planner) + `ExecutionStrategy::BlasGraph` (semiring algebra) subsume it. The user has flagged adding a deliberate `ExecutionStrategy::Simple` cold path as a 4th strategy for trivial queries — that's a separate PR per the documented matrix of execution strategies. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11fe3e21e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T08:08:54Z

+        .with_config(config);
+
+    let sql = query
+        .to_sql(datasets, Some(SqlDialect::Spark))


Remove unsupported dialect argument from to_sql tests

These new integration tests call CypherQuery::to_sql with a second SqlDialect argument, but the current API in crates/lance-graph/src/query.rs only accepts datasets and the crate does not export a SqlDialect type. That makes this test target fail at compile time (before any assertions run), so cargo test cannot complete after this commit unless the API changes are added in the same patch.

Useful? React with 👍 / 👎.

Codifies 7 anti-patterns (AP1-AP7) learned from PRs #176-#188 into an agent card that fires flags when the session repeats them: AP1: "225/225 feels like success" without gate 2 (#178) AP2: Projecting quality from docs instead of measuring (#177) AP3: Building new codec before benching existing ones (#184) AP4: Centroid-residual framing on near-orthogonal data (#177/#183) AP5: Python in the inference hot path AP6: Chained score multiplication without chain-collapse check (P5) AP7: Modifying ndarray without explicit permission (#176) Invoked by adk-coordinator when pattern repetition is suspected, or by human directly. Output: list of fired flags, max 7 lines. Also audited all 29 agent cards across both repos: - All pin model: opus or model: sonnet (no hardcoded versions) - opus → Opus 4.7 automatically, sonnet → Sonnet 4.6 - 3 ndarray agents on sonnet (l3-strategist, migration-tracker, product-engineer) — intentional for speed-over-depth roles - adk-coordinator missing Bash tool (by design — delegates) - sentinel-qa missing Edit/Write (by design — audit-only) No agent changes needed for Opus 4.7 compatibility — model: opus resolves correctly. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

AdaWorldAPI merged commit 56e56a7 into main Apr 17, 2026

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

AdaWorldAPI mentioned this pull request Apr 17, 2026

feat: R&D codec bench framework — upstream sync, probes P5/P7, InferenceBackend, measurement model #189

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: remove stale upstream snapshot + port spark_dialect from upstream #150#188

chore: remove stale upstream snapshot + port spark_dialect from upstream #150#188
AdaWorldAPI merged 1 commit into
mainfrom
claude/codec-rnd-bench

AdaWorldAPI commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 17, 2026

Summary

Stale artifact removal (182 files, 3 MB)

Cherry-pick: spark_dialect.rs from upstream PR #150

Upstream audit (for the record)

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cherry-pick: `spark_dialect.rs` from upstream PR #150