Skip to content

Pluggable graph interface for Hoptimator topology + Mermaid renderer#222

Draft
jogrogan wants to merge 25 commits into
mainfrom
jogrogan/visualizer
Draft

Pluggable graph interface for Hoptimator topology + Mermaid renderer#222
jogrogan wants to merge 25 commits into
mainfrom
jogrogan/visualizer

Conversation

@jogrogan
Copy link
Copy Markdown
Collaborator

Summary

Adds a generic graph interface to Hoptimator — a pluggable model for representing the topology of pipelines, triggers, and logical tables — together with an initial Mermaid renderer that's wired into the sqlline !graph command. The interface is the point; Mermaid is one rendering of it.

Stacked on jogrogan/trigger-dep-guard — relies on the depends-on-sources / depends-on-sinks annotations that branch lands.

The graph interface (hoptimator-api)

The data model and SPI live where the rest of Hoptimator's public surface lives, so a renderer or backend can depend on them without pulling K8s, util, or anything else:

  • PipelineGraph / GraphNode / GraphEdge — the data model. Nodes are typed (External, Pipeline, Trigger, LogicalTable, View); edges carry direction + kind (READS, WRITES, OWNS, DEPENDS_ON_SOURCE, DEPENDS_ON_SINK, TRIGGERS).
  • GraphTarget — the entry point. Three concrete kinds: View(namespace, name), LogicalTable(namespace, name), Resource(database, path).
  • GraphProvider — SPI that turns a GraphTarget + depth + JDBC connection into a PipelineGraph. Registered via META-INF/services. Mirrors the existing DeployerProvider shape.
  • GraphRenderer — SPI that turns a PipelineGraph into a string in some format. Also ServiceLoader-discovered.

The dispatcher (hoptimator-util)

GraphService in hoptimator-util is the runtime entry point — buildGraph(target, depth, conn), render(graph, format), availableFormats(). Sits alongside the other service classes (DeploymentService, ValidationService). Doesn't know about K8s or Mermaid; just ServiceLoaders providers and renderers and dispatches.

A future deployment substrate (non-K8s, RPC-backed, in-process) or a different rendering target (D3 JSON for an interactive web view, DOT for graphviz, etc.) plugs in without touching anything in this PR.

The K8s provider (hoptimator-k8s)

K8sGraphProvider is the only provider in this PR. It uses PipelineGraphBuilder to traverse the K8s state: starts at the target, uses the depends-on-<slug> labels to find Pipelines and TableTriggers in O(matches) on the server, parses the
directional source/sink annotations to draw edges, and recurses up to a configurable depth.

  • Direction-aware: UPSTREAM follows producers, DOWNSTREAM follows consumers. Self-loops and cycles pinned with a visited set + a unit test.
  • SQL-side identifier resolution for !graph table foo.bar — resolves against the Calcite catalog so users don't have to know the underlying <database>_<path> slug shape.
  • Trigger node detail comes from the rendered Job's YAML (container name, JobTemplate name derived from metadata.name's <trigger>-<template> shape) and the spec (cron schedule humanized, paused state). No extra annotation needed.

The Mermaid renderer (hoptimator-graph)

New module — exists solely to hold the Mermaid renderer (and any future renderers we don't want bloating hoptimator-util). MermaidRenderer in com.linkedin.hoptimator.graph.mermaid is registered as the default for format "mermaid". Produces a
flowchart with subgraphs for LogicalTable tiers, distinct node shapes per type, and dotted edges for trigger flows. Other renderers can register against the same SPI without changing this one.

Test plan

  • gradle test green across hoptimator-api, hoptimator-k8s, hoptimator-graph, hoptimator-logical, hoptimator-jdbc, hoptimator-util.
  • PipelineGraphBuilderTest — view / MV-on-MV / LogicalTable / external-resource entry points, direction-aware traversal, depth-limit termination, cycle handling, stale-label rejection via annotation cross-check, YAML-derivation helpers.
  • MermaidRendererTest — Mermaid output shape for each node/edge type.
  • GraphServiceTest — SPI dispatch happy path + the "no provider supports target" error message.
  • Quidem integration scripts (k8s-trigger-graph.id, logical-graph.id, etc.) assert on the exact rendered Mermaid output for canonical scenarios end-to-end against a real K8s cluster.

Known limitations

  • This visualizer relies on labels that will not be present for existing pipelines

jogrogan and others added 25 commits May 12, 2026 11:35
Refuses a DROP TABLE while an active Pipeline still references the
resource (as either source or sink), so dropping the underlying Kafka
topic / Venice store / MySQL table can't silently orphan a downstream
pipeline.

Validator framework, made Connection-aware:
- Validated.validate(Issues, Connection)              (was: validate(Issues))
- ValidatorProvider.validators(T, Connection)         (was: validators(T))
- ValidationService.validate(T, Issues, Connection)
- ValidationService.validateOrThrow(T, Connection)
- ValidationService.validateOrThrow(Collection<T>, Connection)
- ValidationService.validators(T, Connection)

PendingDelete<T> wrapper (hoptimator-api):
- Explicit "this is being deleted" signal so unrelated callers of
  validateOrThrow(source, connection) don't accidentally trigger
  pre-delete checks.
- Carries an optional selfOwnerUid so cascade-deleted children can be
  excluded from the dependent set.

K8s indexed lookup:
- PipelineDependencyLabels stamps `depends-on-<slug>` labels on every
  Pipeline CRD at create time, naming each source/sink. The slug is a
  16-char SHA-256 prefix of `<database>_<dot-joined-path>`; an
  annotation lists the full identifiers so a slug collision can be
  detected at check time.
- PipelineDependencyChecker uses a server-indexed label-selector list
  + annotation cross-check + selfOwnerUid filter.
- K8sPipelineDeployer threads sources/sink through and calls
  PipelineDependencyLabels.labelsFor / annotationFor at toK8sObject().
  K8sPipelineBundle and K8sMaterializedViewDeployer pass the data
  through.

Dispatch:
- K8sValidatorProvider returns a K8sPipelineDependencyValidator for
  PendingDelete<Source>; registered via
  META-INF/services/com.linkedin.hoptimator.ValidatorProvider.
- K8sPipelineDependencyValidator wraps PipelineDependencyChecker as a
  Validator.

DROP TABLE wiring:
- HoptimatorDdlExecutor calls
  ValidationService.validateOrThrow(new PendingDelete<>(source),
  connection) before DeploymentService.delete in the table branch.
  HoptimatorDdlUtils.removeTableFromSchema() is the symmetric inverse
  of registerTemporaryTableInSchema() for cleanup.

Implementor side-effects (no behavior change):
- KafkaDeployer / VeniceDeployer / MySqlDeployer no longer need a
  declarative DependencyGuarded marker — the guard fires from the
  validator framework before delete() is reached.
- All existing Validated implementors (DefaultValidator,
  CompatibilityValidatorBase, AvroTableValidator, K8sViewTable) and
  ValidatorProvider implementors (DefaultValidatorProvider,
  CompatibilityValidatorProvider, AvroValidatorProvider) updated to
  the new signatures.

Tests: PipelineDependencyLabelsTest, PipelineDependencyCheckerTest,
K8sPipelineDeployerTest assertions for stamping, validator-framework
test updates throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LogicalTableDeployer.delete() previously threw SQLFeatureNotSupported.
Now implemented end-to-end as a per-tier sequence that mirrors what
running DROP TABLE on each tier independently would do, plus the
LogicalTable CRD removal at the top.

Flow:
1. Per-tier pre-flight via the validator framework:
   ValidationService.validateOrThrow(new PendingDelete<>(tierSource,
   logicalTableUid), connection) — refuses the drop if any active
   external pipeline still references a tier resource. The
   selfOwnerUid is the LogicalTable CRD's UID so the implicit
   inter-tier pipelines (owned by the CRD, cascade-deleted with it)
   don't self-block.
2. Delete the LogicalTable CRD. K8s owner-ref cascade removes its
   owned Pipeline and TableTrigger CRDs.
3. Best-effort physical cleanup of each tier resource (Kafka topic,
   Venice store, ...). A failed tier delete logs a warning but does
   not abort — a stranded tier is recoverable; aborting mid-DROP
   isn't.
4. Per-tier schema cleanup: deregister the TemporaryTable entry in
   each tier schema only when its physical delete succeeded.

Tests:
- LogicalTableDeployerTest deleteRemovesCrdAndCleansUpTierResources,
  deletePropagatesCrdDeletionFailure, deleteSwallowsTierCleanupFailures.
- logical-ddl.id integration test: DROP TABLE LOGICAL.testevent now
  succeeds and cascades the implicit nearline-to-online pipeline.
- logical-offline-ddl.id companion update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kafka-ddl-create-table.id: cross-driver dependency-guard scenarios
exercising the new pre-delete check end-to-end through the kafka
driver — drop-table-while-pipeline-depends-on-it (source side and
partial-view sink side).

The bulk of the file count is mechanical noise reduction across
existing test files: dropped unused imports, tightened generics on
@SuppressWarnings, etc. — fallout from the warning_cleanup pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pipelines previously stamped a single `depends-on` annotation listing
every source and sink undifferentiated. The dep-guard collision check
worked on this, but it loses the source/sink direction information
needed for visualization.

Replace the unified annotation with two directional annotations:

  hoptimator.linkedin.com/depends-on-sources: <s1>,<s2>,...
  hoptimator.linkedin.com/depends-on-sink:    <single sink>

The dep-guard's annotationConfirms now reads sources or sink as the
collision-guard set — same correctness guarantee, no semantic change
for the dep-guard. The split unlocks directional rendering for the
upcoming pipeline visualizer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes that bring triggers into the depends-on label
index so the pre-delete dep-guard can find them.

Trigger API:

  - Replace the old `(String database, List<String> path)` pair with
    a typed `Source source` field (existing hoptimator-api type).
    Drop the convenience getters database() / path() / table() /
    schema() — callers go through trigger.source().X() for symmetry
    with the new Sink.
  - Add an optional `Sink sink` field for bridging-tier triggers
    (LogicalTableDeployer's offline → online reverse-ETL flow), so
    the deployer can stamp a depends-on-sink annotation in addition
    to the source side.
  - Source is nullable for DROP / PAUSE / RESUME paths that only
    need the trigger name.

K8sTriggerDeployer stamping:

  - On both the toK8sObject and the partial-update paths, stamp:
      depends-on-<sourceSlug>: <sourceIdentifier>
      annotation depends-on-sources: <sourceIdentifier>
    and when sink is set:
      depends-on-<sinkSlug>: <sinkIdentifier>
      annotation depends-on-sink: <sinkIdentifier>

LogicalTableDeployer wiring:

  - Pass `offlineSource` directly as the trigger's Source and a Sink
    derived from `onlineSource` (when present) as the trigger's Sink.

HoptimatorDdlExecutor:

  - Resolve the target table's database name the same way DROP TABLE
    resolves it (HoptimatorJdbcTable / TemporaryTable unwrap), so
    user-created `CREATE TRIGGER ... ON <schema>.<table>` triggers
    participate in the dep-guard. Without this, Trigger.source was
    null and K8sTriggerDeployer skipped label stamping — a trigger
    could outlive its source silently.

Tests:

  - K8sTriggerDeployerTest gains updateStampsSinkLabelWhenTriggerCarriesASink
    pinning that a Trigger carrying a Sink stamps both source-side and
    sink-side depends-on labels on the partial-update path.
  - TriggerTest covers the new accessor shape (source(), sink(), null
    source for the bare-name case).
  - LogicalTableDeployerTest asserts trigger.source() / trigger.sink()
    instead of the removed convenience getters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Triggers carry the same depends-on-<slug> labels Pipelines do (stamped
by K8sTriggerDeployer in the previous commit), but the dep-guard's
PipelineDependencyChecker only consulted the Pipeline CRD list. That
left a hole: a user could DROP TABLE on a source still referenced by
a live trigger.

PipelineDependencyChecker now runs the same label-selector +
annotation-confirmation logic against TableTrigger CRDs as well. The
inner loop is genericized over KubernetesObject; each blocker is
tagged with its CRD kind (pipeline/foo, trigger/bar) so the error
message points the user at what to unhook. Self-owner exclusion still
applies — LogicalTable-owned triggers don't block their parent's
cascade-delete.

Coverage:
- PipelineDependencyCheckerTest gets paired blocksOnExternalTrigger,
  skipsSelfOwnedTrigger, and errorMessageListsAllBlockersAcrossKinds
  cases that prove triggers participate alongside pipelines.
- New k8s-trigger-validation.id integration scenario:
  CREATE TABLE → CREATE TRIGGER → DROP TABLE blocked → DROP TRIGGER →
  DROP TABLE succeeds. Mirrors the MV pattern the existing
  k8s-validation.id scenarios use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PipelineDependency{Labels,Checker} and K8sPipelineDependencyValidator
were specific to Pipeline CRDs in name only — TableTrigger CRDs now
wear the same depends-on labels and annotations. Renamed to
DependencyLabels, DependencyChecker, and K8sDependencyValidator.

Collapsed the labels API: DependencyLabels now exposes a single
stamp(V1ObjectMeta, Collection<Source>, Sink) that writes both the
depends-on labels and the directional annotations. K8sTriggerDeployer
had reimplemented this inline in two places; both call sites now
collapse to one stamp() call.

Also dropped the 5-arg Trigger constructor; callers must pass a sink
(or null) explicitly.
Two pieces:

- hoptimator-api: pure-data types describing a pipeline visualization
  graph. PipelineGraph (root + nodes + edges), GraphNode hierarchy
  (External, Pipeline, View, LogicalTable, Trigger), GraphEdge with
  directional types (DEPENDS_ON_SOURCE, DEPENDS_ON_SINK, OWNER_OF,
  TRIGGERS), GraphTarget for lookup keys.

- hoptimator-k8s: PipelineGraphBuilder, the K8s-state-backed
  materializer. Three entry points — forView, forLogicalTable,
  forResource — each producing a PipelineGraph the renderer consumes
  downstream. forView/forLogicalTable scan owner-references to find
  realizing pipelines; forResource does a label-selector reverse
  lookup via depends-on-<slug> labels.

No renderers or CLI wiring yet — those come in following commits.
- MermaidRenderer (in hoptimator-util) — flowchart serializer with
  shape-per-kind (cylinder for externals, parallelogram for pipelines,
  rectangle for views, subgraph wrappers for owned children).
- sqlline `!graph <view|logical|table> <name> [--depth N]` command
  in hoptimator-cli. Default depth 2, cap 5.
- Quidem `!graph` hook (initially in K8sQuidemTestBase; promoted to
  QuidemTestBase in a subsequent commit) so .id integration scripts
  can exercise the CLI surface against a real cluster.

No SPI yet — the renderer/builder are wired directly. The module
extraction comes in a later commit.
The visualization layer was hard-wired to MermaidRenderer in
hoptimator-util and PipelineGraphBuilder in hoptimator-k8s. Split
both behind ServiceLoader-discovered SPIs so non-K8s backends and
alternative renderers (DOT, JSON, etc.) can plug in without touching
the CLI.

New SPIs in hoptimator-api:
- GraphProvider — forTarget(GraphTarget, depth, Connection).
- GraphRenderer — render(PipelineGraph), format().
- GraphTarget — abstract base + nested final subclasses
  (View, LogicalTable, Resource), one per CLI mode.

New module hoptimator-graph: holds MermaidRenderer (moved from
hoptimator-util) registered via META-INF/services.

K8sGraphProvider in hoptimator-k8s wraps the existing builder behind
the SPI, also registered via META-INF/services.

GraphService dispatcher moves into hoptimator-util alongside the
other *Service classes (DeploymentService, ValidationService). The
Quidem `!graph` command lifts from K8sQuidemTestBase into
QuidemTestBase itself — it only goes through GraphService + the SPI,
so nothing about it is K8s-specific. K8sQuidemTestBase had no other
content and is removed.
Two trigger-related additions:

- Cron humanization in MermaidRenderer. Trigger labels were
  rendering raw cron strings (e.g. "0 */6 * * *") which people
  consistently mis-read. CronHumanizer delegates to cron-utils'
  CronDescriptor with the same CronDefinition the trigger reconciler
  uses, so what the operator can fire the visualizer can describe.
  Unparseable inputs fall through to the raw cron string.

- Pre-delete dep-guard extension to TableTrigger CRDs.
  PipelineDependencyChecker now consults both Pipeline and Trigger
  CRDs via the same label/annotation scheme, so a DROP TABLE on a
  source referenced only by a live trigger is blocked the same way
  pipeline references block it. HoptimatorDdlExecutor's CREATE
  TRIGGER path resolves the target's database from the resolved
  table (same convention as DROP TABLE) so user-created triggers
  participate in the dep-guard.
Two cleanups to the rendered output:

- Subgraph wrappers for View owners drop the resource name from the
  title — the inner pipeline node already shows the name, so the
  wrapper carries only the kind ("Materialized View" / "View").
- displayName() returns bare names across GraphNode kinds. The
  Mermaid shape conveys the kind (rhombus for triggers, rectangle
  for views, etc.), so the "Trigger foo" / "Materialized View foo"
  prefixes were redundant. LogicalTable subgraph wrappers carry
  "LogicalTable <name>" since no inner node carries the name.

Also: scaffold `!graph` integration scenarios in k8s-graph.id with
TODO stubs for cases worth pinning end-to-end (filled in by the
integration-tests commit).
Two correctness/UX improvements to the visualizer that landed
iteratively:

- Direction-aware traversal in PipelineGraphBuilder. The old
  recursion walked both directions from every external it visited,
  so `!graph view` would pull in downstream consumers of its sink.
  New Direction enum (UPSTREAM / DOWNSTREAM) splits the walk: from
  a resource, UPSTREAM finds producer pipelines (where the resource
  is the sink); DOWNSTREAM finds consumers (where it's a source).
  Recursion preserves direction so siblings of intermediates don't
  leak in via the opposite side.

  !graph view / !graph logical are scoped to the entity's
  neighborhood (depth=0 beyond the owned pipeline) — they answer
  "what does this view do", not "what's its data lineage". The
  chain view belongs to !graph table, which walks bidirectionally
  from the root with depth-controlled recursion.

- SQL-side identifier resolution in K8sGraphProvider. Pipelines
  stamp depends-on labels using the K8s-side form (Database CRD
  name + qualified path including schema), but the CLI accepted
  SQL-side input. K8sGraphProvider.resolveResource() now bridges:

    * Exact CRD-name match passes through.
    * Schema-name match (`ADS.AD_CLICKS`) substitutes the CRD name
      and prepends the schema.
    * Catalog-name match (`MYSQL.testdb.orders`, 3-level) substitutes
      the CRD name; canonical path is `[catalog, schema, ...rest]`.
    * Schema-skipped catalog input (`MYSQL.orders`) inserts the
      schema. Catalog-skipped schema input (`testdb.orders`) prepends
      the catalog.

  Path tail preserved verbatim — Calcite-normalized MV source paths
  are upper case but LogicalTable inter-tier pipelines are mixed.
  Auto-canonicalization would help one and break the other; users
  copy the canonical form from `!graph view` / `!graph logical`
  output.

Trigger label shrink: drop redundant `kind: Job` and verbose `job:`
lines from the trigger rhombus — the rhombus shape already conveys
kind, and the job name is just trigger name + template + "-job".
End-to-end coverage across drivers, one TestSqlScripts test per
.id file:

- hoptimator-k8s: k8s-graph.id (MV cases — single source, multi-
  source join, MV-on-MV inlining producing fan-out), k8s-trigger-
  graph.id (standalone triggers with cron + paused state),
  k8s-trigger-validation.id (dep-guard for triggers).
- hoptimator-logical: logical-graph.id (LogicalTable with nearline+
  online tiers; LT with offline tier + auto-spawned bridging trigger).
- hoptimator-mysql: mysql-graph.id (3-level catalog+schema+table
  identifiers via the JDBC-backed driver).

Pins scenarios that are easy to silently regress in a builder
refactor: empty / unknown resource warning, multi-source join shape,
trigger source/sink wiring, LogicalTable tier subgraphs + bridging
trigger, !graph table reverse lookup with direction preservation,
3-level path resolution via catalog match.

Also drops PipelineGraph.incoming() / outgoing() — convenience
accessors with no callers in the codebase.
The base `com.linkedin.hoptimator` package was carrying six visualization
types alongside the unrelated core types (Deployer, Source, Sink, etc.).
Pulling them into a `graph` subpackage gives them a clean namespace:

  com.linkedin.hoptimator.graph          (api)    — data model + SPIs
  com.linkedin.hoptimator.graph.mermaid  (graph)  — Mermaid renderer

The two modules contribute to different sub-packages of the same root,
not the same exact package, so there's no split-package concern.

Moved: GraphNode, GraphEdge, PipelineGraph, GraphProvider, GraphRenderer,
GraphTarget. Updated imports across consumer modules (cli, util, k8s,
jdbc, graph) and renamed the two META-INF/services registration files
to match the new fully-qualified names.

No behavior change.
@jogrogan jogrogan changed the title Pipeline visualizer: !graph CLI + Mermaid renderer + SPI seam Pluggable graph interface for Hoptimator topology + Mermaid renderer May 12, 2026
@ryannedolan
Copy link
Copy Markdown
Collaborator

Can we see example input/output in the PR description? GH will render mermaid!

Base automatically changed from jogrogan/trigger-dep-guard to main May 13, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants