Pluggable graph interface for Hoptimator topology + Mermaid renderer#222
Draft
jogrogan wants to merge 25 commits into
Draft
Pluggable graph interface for Hoptimator topology + Mermaid renderer#222jogrogan wants to merge 25 commits into
jogrogan wants to merge 25 commits into
Conversation
Refuses a DROP TABLE while an active Pipeline still references the resource (as either source or sink), so dropping the underlying Kafka topic / Venice store / MySQL table can't silently orphan a downstream pipeline. Validator framework, made Connection-aware: - Validated.validate(Issues, Connection) (was: validate(Issues)) - ValidatorProvider.validators(T, Connection) (was: validators(T)) - ValidationService.validate(T, Issues, Connection) - ValidationService.validateOrThrow(T, Connection) - ValidationService.validateOrThrow(Collection<T>, Connection) - ValidationService.validators(T, Connection) PendingDelete<T> wrapper (hoptimator-api): - Explicit "this is being deleted" signal so unrelated callers of validateOrThrow(source, connection) don't accidentally trigger pre-delete checks. - Carries an optional selfOwnerUid so cascade-deleted children can be excluded from the dependent set. K8s indexed lookup: - PipelineDependencyLabels stamps `depends-on-<slug>` labels on every Pipeline CRD at create time, naming each source/sink. The slug is a 16-char SHA-256 prefix of `<database>_<dot-joined-path>`; an annotation lists the full identifiers so a slug collision can be detected at check time. - PipelineDependencyChecker uses a server-indexed label-selector list + annotation cross-check + selfOwnerUid filter. - K8sPipelineDeployer threads sources/sink through and calls PipelineDependencyLabels.labelsFor / annotationFor at toK8sObject(). K8sPipelineBundle and K8sMaterializedViewDeployer pass the data through. Dispatch: - K8sValidatorProvider returns a K8sPipelineDependencyValidator for PendingDelete<Source>; registered via META-INF/services/com.linkedin.hoptimator.ValidatorProvider. - K8sPipelineDependencyValidator wraps PipelineDependencyChecker as a Validator. DROP TABLE wiring: - HoptimatorDdlExecutor calls ValidationService.validateOrThrow(new PendingDelete<>(source), connection) before DeploymentService.delete in the table branch. HoptimatorDdlUtils.removeTableFromSchema() is the symmetric inverse of registerTemporaryTableInSchema() for cleanup. Implementor side-effects (no behavior change): - KafkaDeployer / VeniceDeployer / MySqlDeployer no longer need a declarative DependencyGuarded marker — the guard fires from the validator framework before delete() is reached. - All existing Validated implementors (DefaultValidator, CompatibilityValidatorBase, AvroTableValidator, K8sViewTable) and ValidatorProvider implementors (DefaultValidatorProvider, CompatibilityValidatorProvider, AvroValidatorProvider) updated to the new signatures. Tests: PipelineDependencyLabelsTest, PipelineDependencyCheckerTest, K8sPipelineDeployerTest assertions for stamping, validator-framework test updates throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LogicalTableDeployer.delete() previously threw SQLFeatureNotSupported. Now implemented end-to-end as a per-tier sequence that mirrors what running DROP TABLE on each tier independently would do, plus the LogicalTable CRD removal at the top. Flow: 1. Per-tier pre-flight via the validator framework: ValidationService.validateOrThrow(new PendingDelete<>(tierSource, logicalTableUid), connection) — refuses the drop if any active external pipeline still references a tier resource. The selfOwnerUid is the LogicalTable CRD's UID so the implicit inter-tier pipelines (owned by the CRD, cascade-deleted with it) don't self-block. 2. Delete the LogicalTable CRD. K8s owner-ref cascade removes its owned Pipeline and TableTrigger CRDs. 3. Best-effort physical cleanup of each tier resource (Kafka topic, Venice store, ...). A failed tier delete logs a warning but does not abort — a stranded tier is recoverable; aborting mid-DROP isn't. 4. Per-tier schema cleanup: deregister the TemporaryTable entry in each tier schema only when its physical delete succeeded. Tests: - LogicalTableDeployerTest deleteRemovesCrdAndCleansUpTierResources, deletePropagatesCrdDeletionFailure, deleteSwallowsTierCleanupFailures. - logical-ddl.id integration test: DROP TABLE LOGICAL.testevent now succeeds and cascades the implicit nearline-to-online pipeline. - logical-offline-ddl.id companion update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kafka-ddl-create-table.id: cross-driver dependency-guard scenarios exercising the new pre-delete check end-to-end through the kafka driver — drop-table-while-pipeline-depends-on-it (source side and partial-view sink side). The bulk of the file count is mechanical noise reduction across existing test files: dropped unused imports, tightened generics on @SuppressWarnings, etc. — fallout from the warning_cleanup pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pipelines previously stamped a single `depends-on` annotation listing every source and sink undifferentiated. The dep-guard collision check worked on this, but it loses the source/sink direction information needed for visualization. Replace the unified annotation with two directional annotations: hoptimator.linkedin.com/depends-on-sources: <s1>,<s2>,... hoptimator.linkedin.com/depends-on-sink: <single sink> The dep-guard's annotationConfirms now reads sources or sink as the collision-guard set — same correctness guarantee, no semantic change for the dep-guard. The split unlocks directional rendering for the upcoming pipeline visualizer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes that bring triggers into the depends-on label
index so the pre-delete dep-guard can find them.
Trigger API:
- Replace the old `(String database, List<String> path)` pair with
a typed `Source source` field (existing hoptimator-api type).
Drop the convenience getters database() / path() / table() /
schema() — callers go through trigger.source().X() for symmetry
with the new Sink.
- Add an optional `Sink sink` field for bridging-tier triggers
(LogicalTableDeployer's offline → online reverse-ETL flow), so
the deployer can stamp a depends-on-sink annotation in addition
to the source side.
- Source is nullable for DROP / PAUSE / RESUME paths that only
need the trigger name.
K8sTriggerDeployer stamping:
- On both the toK8sObject and the partial-update paths, stamp:
depends-on-<sourceSlug>: <sourceIdentifier>
annotation depends-on-sources: <sourceIdentifier>
and when sink is set:
depends-on-<sinkSlug>: <sinkIdentifier>
annotation depends-on-sink: <sinkIdentifier>
LogicalTableDeployer wiring:
- Pass `offlineSource` directly as the trigger's Source and a Sink
derived from `onlineSource` (when present) as the trigger's Sink.
HoptimatorDdlExecutor:
- Resolve the target table's database name the same way DROP TABLE
resolves it (HoptimatorJdbcTable / TemporaryTable unwrap), so
user-created `CREATE TRIGGER ... ON <schema>.<table>` triggers
participate in the dep-guard. Without this, Trigger.source was
null and K8sTriggerDeployer skipped label stamping — a trigger
could outlive its source silently.
Tests:
- K8sTriggerDeployerTest gains updateStampsSinkLabelWhenTriggerCarriesASink
pinning that a Trigger carrying a Sink stamps both source-side and
sink-side depends-on labels on the partial-update path.
- TriggerTest covers the new accessor shape (source(), sink(), null
source for the bare-name case).
- LogicalTableDeployerTest asserts trigger.source() / trigger.sink()
instead of the removed convenience getters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Triggers carry the same depends-on-<slug> labels Pipelines do (stamped by K8sTriggerDeployer in the previous commit), but the dep-guard's PipelineDependencyChecker only consulted the Pipeline CRD list. That left a hole: a user could DROP TABLE on a source still referenced by a live trigger. PipelineDependencyChecker now runs the same label-selector + annotation-confirmation logic against TableTrigger CRDs as well. The inner loop is genericized over KubernetesObject; each blocker is tagged with its CRD kind (pipeline/foo, trigger/bar) so the error message points the user at what to unhook. Self-owner exclusion still applies — LogicalTable-owned triggers don't block their parent's cascade-delete. Coverage: - PipelineDependencyCheckerTest gets paired blocksOnExternalTrigger, skipsSelfOwnedTrigger, and errorMessageListsAllBlockersAcrossKinds cases that prove triggers participate alongside pipelines. - New k8s-trigger-validation.id integration scenario: CREATE TABLE → CREATE TRIGGER → DROP TABLE blocked → DROP TRIGGER → DROP TABLE succeeds. Mirrors the MV pattern the existing k8s-validation.id scenarios use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PipelineDependency{Labels,Checker} and K8sPipelineDependencyValidator
were specific to Pipeline CRDs in name only — TableTrigger CRDs now
wear the same depends-on labels and annotations. Renamed to
DependencyLabels, DependencyChecker, and K8sDependencyValidator.
Collapsed the labels API: DependencyLabels now exposes a single
stamp(V1ObjectMeta, Collection<Source>, Sink) that writes both the
depends-on labels and the directional annotations. K8sTriggerDeployer
had reimplemented this inline in two places; both call sites now
collapse to one stamp() call.
Also dropped the 5-arg Trigger constructor; callers must pass a sink
(or null) explicitly.
Two pieces: - hoptimator-api: pure-data types describing a pipeline visualization graph. PipelineGraph (root + nodes + edges), GraphNode hierarchy (External, Pipeline, View, LogicalTable, Trigger), GraphEdge with directional types (DEPENDS_ON_SOURCE, DEPENDS_ON_SINK, OWNER_OF, TRIGGERS), GraphTarget for lookup keys. - hoptimator-k8s: PipelineGraphBuilder, the K8s-state-backed materializer. Three entry points — forView, forLogicalTable, forResource — each producing a PipelineGraph the renderer consumes downstream. forView/forLogicalTable scan owner-references to find realizing pipelines; forResource does a label-selector reverse lookup via depends-on-<slug> labels. No renderers or CLI wiring yet — those come in following commits.
- MermaidRenderer (in hoptimator-util) — flowchart serializer with shape-per-kind (cylinder for externals, parallelogram for pipelines, rectangle for views, subgraph wrappers for owned children). - sqlline `!graph <view|logical|table> <name> [--depth N]` command in hoptimator-cli. Default depth 2, cap 5. - Quidem `!graph` hook (initially in K8sQuidemTestBase; promoted to QuidemTestBase in a subsequent commit) so .id integration scripts can exercise the CLI surface against a real cluster. No SPI yet — the renderer/builder are wired directly. The module extraction comes in a later commit.
The visualization layer was hard-wired to MermaidRenderer in hoptimator-util and PipelineGraphBuilder in hoptimator-k8s. Split both behind ServiceLoader-discovered SPIs so non-K8s backends and alternative renderers (DOT, JSON, etc.) can plug in without touching the CLI. New SPIs in hoptimator-api: - GraphProvider — forTarget(GraphTarget, depth, Connection). - GraphRenderer — render(PipelineGraph), format(). - GraphTarget — abstract base + nested final subclasses (View, LogicalTable, Resource), one per CLI mode. New module hoptimator-graph: holds MermaidRenderer (moved from hoptimator-util) registered via META-INF/services. K8sGraphProvider in hoptimator-k8s wraps the existing builder behind the SPI, also registered via META-INF/services. GraphService dispatcher moves into hoptimator-util alongside the other *Service classes (DeploymentService, ValidationService). The Quidem `!graph` command lifts from K8sQuidemTestBase into QuidemTestBase itself — it only goes through GraphService + the SPI, so nothing about it is K8s-specific. K8sQuidemTestBase had no other content and is removed.
Two trigger-related additions: - Cron humanization in MermaidRenderer. Trigger labels were rendering raw cron strings (e.g. "0 */6 * * *") which people consistently mis-read. CronHumanizer delegates to cron-utils' CronDescriptor with the same CronDefinition the trigger reconciler uses, so what the operator can fire the visualizer can describe. Unparseable inputs fall through to the raw cron string. - Pre-delete dep-guard extension to TableTrigger CRDs. PipelineDependencyChecker now consults both Pipeline and Trigger CRDs via the same label/annotation scheme, so a DROP TABLE on a source referenced only by a live trigger is blocked the same way pipeline references block it. HoptimatorDdlExecutor's CREATE TRIGGER path resolves the target's database from the resolved table (same convention as DROP TABLE) so user-created triggers participate in the dep-guard.
Two cleanups to the rendered output:
- Subgraph wrappers for View owners drop the resource name from the
title — the inner pipeline node already shows the name, so the
wrapper carries only the kind ("Materialized View" / "View").
- displayName() returns bare names across GraphNode kinds. The
Mermaid shape conveys the kind (rhombus for triggers, rectangle
for views, etc.), so the "Trigger foo" / "Materialized View foo"
prefixes were redundant. LogicalTable subgraph wrappers carry
"LogicalTable <name>" since no inner node carries the name.
Also: scaffold `!graph` integration scenarios in k8s-graph.id with
TODO stubs for cases worth pinning end-to-end (filled in by the
integration-tests commit).
Two correctness/UX improvements to the visualizer that landed
iteratively:
- Direction-aware traversal in PipelineGraphBuilder. The old
recursion walked both directions from every external it visited,
so `!graph view` would pull in downstream consumers of its sink.
New Direction enum (UPSTREAM / DOWNSTREAM) splits the walk: from
a resource, UPSTREAM finds producer pipelines (where the resource
is the sink); DOWNSTREAM finds consumers (where it's a source).
Recursion preserves direction so siblings of intermediates don't
leak in via the opposite side.
!graph view / !graph logical are scoped to the entity's
neighborhood (depth=0 beyond the owned pipeline) — they answer
"what does this view do", not "what's its data lineage". The
chain view belongs to !graph table, which walks bidirectionally
from the root with depth-controlled recursion.
- SQL-side identifier resolution in K8sGraphProvider. Pipelines
stamp depends-on labels using the K8s-side form (Database CRD
name + qualified path including schema), but the CLI accepted
SQL-side input. K8sGraphProvider.resolveResource() now bridges:
* Exact CRD-name match passes through.
* Schema-name match (`ADS.AD_CLICKS`) substitutes the CRD name
and prepends the schema.
* Catalog-name match (`MYSQL.testdb.orders`, 3-level) substitutes
the CRD name; canonical path is `[catalog, schema, ...rest]`.
* Schema-skipped catalog input (`MYSQL.orders`) inserts the
schema. Catalog-skipped schema input (`testdb.orders`) prepends
the catalog.
Path tail preserved verbatim — Calcite-normalized MV source paths
are upper case but LogicalTable inter-tier pipelines are mixed.
Auto-canonicalization would help one and break the other; users
copy the canonical form from `!graph view` / `!graph logical`
output.
Trigger label shrink: drop redundant `kind: Job` and verbose `job:`
lines from the trigger rhombus — the rhombus shape already conveys
kind, and the job name is just trigger name + template + "-job".
End-to-end coverage across drivers, one TestSqlScripts test per .id file: - hoptimator-k8s: k8s-graph.id (MV cases — single source, multi- source join, MV-on-MV inlining producing fan-out), k8s-trigger- graph.id (standalone triggers with cron + paused state), k8s-trigger-validation.id (dep-guard for triggers). - hoptimator-logical: logical-graph.id (LogicalTable with nearline+ online tiers; LT with offline tier + auto-spawned bridging trigger). - hoptimator-mysql: mysql-graph.id (3-level catalog+schema+table identifiers via the JDBC-backed driver). Pins scenarios that are easy to silently regress in a builder refactor: empty / unknown resource warning, multi-source join shape, trigger source/sink wiring, LogicalTable tier subgraphs + bridging trigger, !graph table reverse lookup with direction preservation, 3-level path resolution via catalog match. Also drops PipelineGraph.incoming() / outgoing() — convenience accessors with no callers in the codebase.
The base `com.linkedin.hoptimator` package was carrying six visualization types alongside the unrelated core types (Deployer, Source, Sink, etc.). Pulling them into a `graph` subpackage gives them a clean namespace: com.linkedin.hoptimator.graph (api) — data model + SPIs com.linkedin.hoptimator.graph.mermaid (graph) — Mermaid renderer The two modules contribute to different sub-packages of the same root, not the same exact package, so there's no split-package concern. Moved: GraphNode, GraphEdge, PipelineGraph, GraphProvider, GraphRenderer, GraphTarget. Updated imports across consumer modules (cli, util, k8s, jdbc, graph) and renamed the two META-INF/services registration files to match the new fully-qualified names. No behavior change.
Collaborator
|
Can we see example input/output in the PR description? GH will render mermaid! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a generic graph interface to Hoptimator — a pluggable model for representing the topology of pipelines, triggers, and logical tables — together with an initial Mermaid renderer that's wired into the sqlline !graph command. The interface is the point; Mermaid is one rendering of it.
Stacked on jogrogan/trigger-dep-guard — relies on the depends-on-sources / depends-on-sinks annotations that branch lands.
The graph interface (hoptimator-api)
The data model and SPI live where the rest of Hoptimator's public surface lives, so a renderer or backend can depend on them without pulling K8s, util, or anything else:
The dispatcher (hoptimator-util)
GraphService in hoptimator-util is the runtime entry point — buildGraph(target, depth, conn), render(graph, format), availableFormats(). Sits alongside the other service classes (DeploymentService, ValidationService). Doesn't know about K8s or Mermaid; just ServiceLoaders providers and renderers and dispatches.
A future deployment substrate (non-K8s, RPC-backed, in-process) or a different rendering target (D3 JSON for an interactive web view, DOT for graphviz, etc.) plugs in without touching anything in this PR.
The K8s provider (hoptimator-k8s)
K8sGraphProvider is the only provider in this PR. It uses PipelineGraphBuilder to traverse the K8s state: starts at the target, uses the depends-on-<slug> labels to find Pipelines and TableTriggers in O(matches) on the server, parses the
directional source/sink annotations to draw edges, and recurses up to a configurable depth.
The Mermaid renderer (hoptimator-graph)
New module — exists solely to hold the Mermaid renderer (and any future renderers we don't want bloating hoptimator-util). MermaidRenderer in com.linkedin.hoptimator.graph.mermaid is registered as the default for format "mermaid". Produces a
flowchart with subgraphs for LogicalTable tiers, distinct node shapes per type, and dotted edges for trigger flows. Other renderers can register against the same SPI without changing this one.
Test plan
Known limitations