Pluggable graph interface for Hoptimator topology + Mermaid renderer by jogrogan · Pull Request #222 · linkedin/Hoptimator

jogrogan · 2026-05-12T19:57:03Z

Summary

Adds a generic graph interface to Hoptimator — a pluggable model for representing the topology of pipelines, triggers, and logical tables — together with an initial Mermaid renderer that's wired into the sqlline !graph command. The interface is the point; Mermaid is one rendering of it.

Stacked on jogrogan/trigger-dep-guard — relies on the depends-on-sources / depends-on-sinks annotations that branch lands.

The graph interface (hoptimator-api)

The data model and SPI live where the rest of Hoptimator's public surface lives, so a renderer or backend can depend on them without pulling K8s, util, or anything else:

PipelineGraph / GraphNode / GraphEdge — the data model. Nodes are typed (External, Pipeline, Trigger, LogicalTable, View); edges carry direction + kind (READS, WRITES, OWNS, DEPENDS_ON_SOURCE, DEPENDS_ON_SINK, TRIGGERS).
GraphTarget — the entry point. Three concrete kinds: View(namespace, name), LogicalTable(namespace, name), Resource(database, path).
GraphProvider — SPI that turns a GraphTarget + depth + JDBC connection into a PipelineGraph. Registered via META-INF/services. Mirrors the existing DeployerProvider shape.
GraphRenderer — SPI that turns a PipelineGraph into a string in some format. Also ServiceLoader-discovered.

The dispatcher (hoptimator-util)

GraphService in hoptimator-util is the runtime entry point — buildGraph(target, depth, conn), render(graph, format), availableFormats(). Sits alongside the other service classes (DeploymentService, ValidationService). Doesn't know about K8s or Mermaid; just ServiceLoaders providers and renderers and dispatches.

A future deployment substrate (non-K8s, RPC-backed, in-process) or a different rendering target (D3 JSON for an interactive web view, DOT for graphviz, etc.) plugs in without touching anything in this PR.

The K8s provider (hoptimator-k8s)

K8sGraphProvider is the only provider in this PR. It uses PipelineGraphBuilder to traverse the K8s state: starts at the target, uses the depends-on-<slug> labels to find Pipelines and TableTriggers in O(matches) on the server, parses the
directional source/sink annotations to draw edges, and recurses up to a configurable depth.

Direction-aware: UPSTREAM follows producers, DOWNSTREAM follows consumers. Self-loops and cycles pinned with a visited set + a unit test.
SQL-side identifier resolution for !graph table foo.bar — resolves against the Calcite catalog so users don't have to know the underlying <database>_<path> slug shape.
Trigger node detail comes from the rendered Job's YAML (container name, JobTemplate name derived from metadata.name's <trigger>-<template> shape) and the spec (cron schedule humanized, paused state). No extra annotation needed.

The Mermaid renderer (hoptimator-graph)

New module — exists solely to hold the Mermaid renderer (and any future renderers we don't want bloating hoptimator-util). MermaidRenderer in com.linkedin.hoptimator.graph.mermaid is registered as the default for format "mermaid". Produces a
flowchart with subgraphs for LogicalTable tiers, distinct node shapes per type, and dotted edges for trigger flows. Other renderers can register against the same SPI without changing this one.

Test plan

gradle test green across hoptimator-api, hoptimator-k8s, hoptimator-graph, hoptimator-logical, hoptimator-jdbc, hoptimator-util.
PipelineGraphBuilderTest — view / MV-on-MV / LogicalTable / external-resource entry points, direction-aware traversal, depth-limit termination, cycle handling, stale-label rejection via annotation cross-check, YAML-derivation helpers.
MermaidRendererTest — Mermaid output shape for each node/edge type.
GraphServiceTest — SPI dispatch happy path + the "no provider supports target" error message.
Quidem integration scripts (k8s-trigger-graph.id, logical-graph.id, etc.) assert on the exact rendered Mermaid output for canonical scenarios end-to-end against a real K8s cluster.

Known limitations

This visualizer relies on labels that will not be present for existing pipelines

Refuses a DROP TABLE while an active Pipeline still references the resource (as either source or sink), so dropping the underlying Kafka topic / Venice store / MySQL table can't silently orphan a downstream pipeline. Validator framework, made Connection-aware: - Validated.validate(Issues, Connection) (was: validate(Issues)) - ValidatorProvider.validators(T, Connection) (was: validators(T)) - ValidationService.validate(T, Issues, Connection) - ValidationService.validateOrThrow(T, Connection) - ValidationService.validateOrThrow(Collection<T>, Connection) - ValidationService.validators(T, Connection) PendingDelete<T> wrapper (hoptimator-api): - Explicit "this is being deleted" signal so unrelated callers of validateOrThrow(source, connection) don't accidentally trigger pre-delete checks. - Carries an optional selfOwnerUid so cascade-deleted children can be excluded from the dependent set. K8s indexed lookup: - PipelineDependencyLabels stamps `depends-on-<slug>` labels on every Pipeline CRD at create time, naming each source/sink. The slug is a 16-char SHA-256 prefix of `<database>_<dot-joined-path>`; an annotation lists the full identifiers so a slug collision can be detected at check time. - PipelineDependencyChecker uses a server-indexed label-selector list + annotation cross-check + selfOwnerUid filter. - K8sPipelineDeployer threads sources/sink through and calls PipelineDependencyLabels.labelsFor / annotationFor at toK8sObject(). K8sPipelineBundle and K8sMaterializedViewDeployer pass the data through. Dispatch: - K8sValidatorProvider returns a K8sPipelineDependencyValidator for PendingDelete<Source>; registered via META-INF/services/com.linkedin.hoptimator.ValidatorProvider. - K8sPipelineDependencyValidator wraps PipelineDependencyChecker as a Validator. DROP TABLE wiring: - HoptimatorDdlExecutor calls ValidationService.validateOrThrow(new PendingDelete<>(source), connection) before DeploymentService.delete in the table branch. HoptimatorDdlUtils.removeTableFromSchema() is the symmetric inverse of registerTemporaryTableInSchema() for cleanup. Implementor side-effects (no behavior change): - KafkaDeployer / VeniceDeployer / MySqlDeployer no longer need a declarative DependencyGuarded marker — the guard fires from the validator framework before delete() is reached. - All existing Validated implementors (DefaultValidator, CompatibilityValidatorBase, AvroTableValidator, K8sViewTable) and ValidatorProvider implementors (DefaultValidatorProvider, CompatibilityValidatorProvider, AvroValidatorProvider) updated to the new signatures. Tests: PipelineDependencyLabelsTest, PipelineDependencyCheckerTest, K8sPipelineDeployerTest assertions for stamping, validator-framework test updates throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LogicalTableDeployer.delete() previously threw SQLFeatureNotSupported. Now implemented end-to-end as a per-tier sequence that mirrors what running DROP TABLE on each tier independently would do, plus the LogicalTable CRD removal at the top. Flow: 1. Per-tier pre-flight via the validator framework: ValidationService.validateOrThrow(new PendingDelete<>(tierSource, logicalTableUid), connection) — refuses the drop if any active external pipeline still references a tier resource. The selfOwnerUid is the LogicalTable CRD's UID so the implicit inter-tier pipelines (owned by the CRD, cascade-deleted with it) don't self-block. 2. Delete the LogicalTable CRD. K8s owner-ref cascade removes its owned Pipeline and TableTrigger CRDs. 3. Best-effort physical cleanup of each tier resource (Kafka topic, Venice store, ...). A failed tier delete logs a warning but does not abort — a stranded tier is recoverable; aborting mid-DROP isn't. 4. Per-tier schema cleanup: deregister the TemporaryTable entry in each tier schema only when its physical delete succeeded. Tests: - LogicalTableDeployerTest deleteRemovesCrdAndCleansUpTierResources, deletePropagatesCrdDeletionFailure, deleteSwallowsTierCleanupFailures. - logical-ddl.id integration test: DROP TABLE LOGICAL.testevent now succeeds and cascades the implicit nearline-to-online pipeline. - logical-offline-ddl.id companion update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@SuppressWarnings

kafka-ddl-create-table.id: cross-driver dependency-guard scenarios exercising the new pre-delete check end-to-end through the kafka driver — drop-table-while-pipeline-depends-on-it (source side and partial-view sink side). The bulk of the file count is mechanical noise reduction across existing test files: dropped unused imports, tightened generics on @SuppressWarnings, etc. — fallout from the warning_cleanup pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pipelines previously stamped a single `depends-on` annotation listing every source and sink undifferentiated. The dep-guard collision check worked on this, but it loses the source/sink direction information needed for visualization. Replace the unified annotation with two directional annotations: hoptimator.linkedin.com/depends-on-sources: <s1>,<s2>,... hoptimator.linkedin.com/depends-on-sink: <single sink> The dep-guard's annotationConfirms now reads sources or sink as the collision-guard set — same correctness guarantee, no semantic change for the dep-guard. The split unlocks directional rendering for the upcoming pipeline visualizer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two related changes that bring triggers into the depends-on label index so the pre-delete dep-guard can find them. Trigger API: - Replace the old `(String database, List<String> path)` pair with a typed `Source source` field (existing hoptimator-api type). Drop the convenience getters database() / path() / table() / schema() — callers go through trigger.source().X() for symmetry with the new Sink. - Add an optional `Sink sink` field for bridging-tier triggers (LogicalTableDeployer's offline → online reverse-ETL flow), so the deployer can stamp a depends-on-sink annotation in addition to the source side. - Source is nullable for DROP / PAUSE / RESUME paths that only need the trigger name. K8sTriggerDeployer stamping: - On both the toK8sObject and the partial-update paths, stamp: depends-on-<sourceSlug>: <sourceIdentifier> annotation depends-on-sources: <sourceIdentifier> and when sink is set: depends-on-<sinkSlug>: <sinkIdentifier> annotation depends-on-sink: <sinkIdentifier> LogicalTableDeployer wiring: - Pass `offlineSource` directly as the trigger's Source and a Sink derived from `onlineSource` (when present) as the trigger's Sink. HoptimatorDdlExecutor: - Resolve the target table's database name the same way DROP TABLE resolves it (HoptimatorJdbcTable / TemporaryTable unwrap), so user-created `CREATE TRIGGER ... ON <schema>.<table>` triggers participate in the dep-guard. Without this, Trigger.source was null and K8sTriggerDeployer skipped label stamping — a trigger could outlive its source silently. Tests: - K8sTriggerDeployerTest gains updateStampsSinkLabelWhenTriggerCarriesASink pinning that a Trigger carrying a Sink stamps both source-side and sink-side depends-on labels on the partial-update path. - TriggerTest covers the new accessor shape (source(), sink(), null source for the bare-name case). - LogicalTableDeployerTest asserts trigger.source() / trigger.sink() instead of the removed convenience getters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Triggers carry the same depends-on-<slug> labels Pipelines do (stamped by K8sTriggerDeployer in the previous commit), but the dep-guard's PipelineDependencyChecker only consulted the Pipeline CRD list. That left a hole: a user could DROP TABLE on a source still referenced by a live trigger. PipelineDependencyChecker now runs the same label-selector + annotation-confirmation logic against TableTrigger CRDs as well. The inner loop is genericized over KubernetesObject; each blocker is tagged with its CRD kind (pipeline/foo, trigger/bar) so the error message points the user at what to unhook. Self-owner exclusion still applies — LogicalTable-owned triggers don't block their parent's cascade-delete. Coverage: - PipelineDependencyCheckerTest gets paired blocksOnExternalTrigger, skipsSelfOwnedTrigger, and errorMessageListsAllBlockersAcrossKinds cases that prove triggers participate alongside pipelines. - New k8s-trigger-validation.id integration scenario: CREATE TABLE → CREATE TRIGGER → DROP TABLE blocked → DROP TRIGGER → DROP TABLE succeeds. Mirrors the MV pattern the existing k8s-validation.id scenarios use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PipelineDependency{Labels,Checker} and K8sPipelineDependencyValidator were specific to Pipeline CRDs in name only — TableTrigger CRDs now wear the same depends-on labels and annotations. Renamed to DependencyLabels, DependencyChecker, and K8sDependencyValidator. Collapsed the labels API: DependencyLabels now exposes a single stamp(V1ObjectMeta, Collection<Source>, Sink) that writes both the depends-on labels and the directional annotations. K8sTriggerDeployer had reimplemented this inline in two places; both call sites now collapse to one stamp() call. Also dropped the 5-arg Trigger constructor; callers must pass a sink (or null) explicitly.

Two pieces: - hoptimator-api: pure-data types describing a pipeline visualization graph. PipelineGraph (root + nodes + edges), GraphNode hierarchy (External, Pipeline, View, LogicalTable, Trigger), GraphEdge with directional types (DEPENDS_ON_SOURCE, DEPENDS_ON_SINK, OWNER_OF, TRIGGERS), GraphTarget for lookup keys. - hoptimator-k8s: PipelineGraphBuilder, the K8s-state-backed materializer. Three entry points — forView, forLogicalTable, forResource — each producing a PipelineGraph the renderer consumes downstream. forView/forLogicalTable scan owner-references to find realizing pipelines; forResource does a label-selector reverse lookup via depends-on-<slug> labels. No renderers or CLI wiring yet — those come in following commits.

- MermaidRenderer (in hoptimator-util) — flowchart serializer with shape-per-kind (cylinder for externals, parallelogram for pipelines, rectangle for views, subgraph wrappers for owned children). - sqlline `!graph <view|logical|table> <name> [--depth N]` command in hoptimator-cli. Default depth 2, cap 5. - Quidem `!graph` hook (initially in K8sQuidemTestBase; promoted to QuidemTestBase in a subsequent commit) so .id integration scripts can exercise the CLI surface against a real cluster. No SPI yet — the renderer/builder are wired directly. The module extraction comes in a later commit.

The visualization layer was hard-wired to MermaidRenderer in hoptimator-util and PipelineGraphBuilder in hoptimator-k8s. Split both behind ServiceLoader-discovered SPIs so non-K8s backends and alternative renderers (DOT, JSON, etc.) can plug in without touching the CLI. New SPIs in hoptimator-api: - GraphProvider — forTarget(GraphTarget, depth, Connection). - GraphRenderer — render(PipelineGraph), format(). - GraphTarget — abstract base + nested final subclasses (View, LogicalTable, Resource), one per CLI mode. New module hoptimator-graph: holds MermaidRenderer (moved from hoptimator-util) registered via META-INF/services. K8sGraphProvider in hoptimator-k8s wraps the existing builder behind the SPI, also registered via META-INF/services. GraphService dispatcher moves into hoptimator-util alongside the other *Service classes (DeploymentService, ValidationService). The Quidem `!graph` command lifts from K8sQuidemTestBase into QuidemTestBase itself — it only goes through GraphService + the SPI, so nothing about it is K8s-specific. K8sQuidemTestBase had no other content and is removed.

Two trigger-related additions: - Cron humanization in MermaidRenderer. Trigger labels were rendering raw cron strings (e.g. "0 */6 * * *") which people consistently mis-read. CronHumanizer delegates to cron-utils' CronDescriptor with the same CronDefinition the trigger reconciler uses, so what the operator can fire the visualizer can describe. Unparseable inputs fall through to the raw cron string. - Pre-delete dep-guard extension to TableTrigger CRDs. PipelineDependencyChecker now consults both Pipeline and Trigger CRDs via the same label/annotation scheme, so a DROP TABLE on a source referenced only by a live trigger is blocked the same way pipeline references block it. HoptimatorDdlExecutor's CREATE TRIGGER path resolves the target's database from the resolved table (same convention as DROP TABLE) so user-created triggers participate in the dep-guard.

Two cleanups to the rendered output: - Subgraph wrappers for View owners drop the resource name from the title — the inner pipeline node already shows the name, so the wrapper carries only the kind ("Materialized View" / "View"). - displayName() returns bare names across GraphNode kinds. The Mermaid shape conveys the kind (rhombus for triggers, rectangle for views, etc.), so the "Trigger foo" / "Materialized View foo" prefixes were redundant. LogicalTable subgraph wrappers carry "LogicalTable <name>" since no inner node carries the name. Also: scaffold `!graph` integration scenarios in k8s-graph.id with TODO stubs for cases worth pinning end-to-end (filled in by the integration-tests commit).

Two correctness/UX improvements to the visualizer that landed iteratively: - Direction-aware traversal in PipelineGraphBuilder. The old recursion walked both directions from every external it visited, so `!graph view` would pull in downstream consumers of its sink. New Direction enum (UPSTREAM / DOWNSTREAM) splits the walk: from a resource, UPSTREAM finds producer pipelines (where the resource is the sink); DOWNSTREAM finds consumers (where it's a source). Recursion preserves direction so siblings of intermediates don't leak in via the opposite side. !graph view / !graph logical are scoped to the entity's neighborhood (depth=0 beyond the owned pipeline) — they answer "what does this view do", not "what's its data lineage". The chain view belongs to !graph table, which walks bidirectionally from the root with depth-controlled recursion. - SQL-side identifier resolution in K8sGraphProvider. Pipelines stamp depends-on labels using the K8s-side form (Database CRD name + qualified path including schema), but the CLI accepted SQL-side input. K8sGraphProvider.resolveResource() now bridges: * Exact CRD-name match passes through. * Schema-name match (`ADS.AD_CLICKS`) substitutes the CRD name and prepends the schema. * Catalog-name match (`MYSQL.testdb.orders`, 3-level) substitutes the CRD name; canonical path is `[catalog, schema, ...rest]`. * Schema-skipped catalog input (`MYSQL.orders`) inserts the schema. Catalog-skipped schema input (`testdb.orders`) prepends the catalog. Path tail preserved verbatim — Calcite-normalized MV source paths are upper case but LogicalTable inter-tier pipelines are mixed. Auto-canonicalization would help one and break the other; users copy the canonical form from `!graph view` / `!graph logical` output. Trigger label shrink: drop redundant `kind: Job` and verbose `job:` lines from the trigger rhombus — the rhombus shape already conveys kind, and the job name is just trigger name + template + "-job".

End-to-end coverage across drivers, one TestSqlScripts test per .id file: - hoptimator-k8s: k8s-graph.id (MV cases — single source, multi- source join, MV-on-MV inlining producing fan-out), k8s-trigger- graph.id (standalone triggers with cron + paused state), k8s-trigger-validation.id (dep-guard for triggers). - hoptimator-logical: logical-graph.id (LogicalTable with nearline+ online tiers; LT with offline tier + auto-spawned bridging trigger). - hoptimator-mysql: mysql-graph.id (3-level catalog+schema+table identifiers via the JDBC-backed driver). Pins scenarios that are easy to silently regress in a builder refactor: empty / unknown resource warning, multi-source join shape, trigger source/sink wiring, LogicalTable tier subgraphs + bridging trigger, !graph table reverse lookup with direction preservation, 3-level path resolution via catalog match. Also drops PipelineGraph.incoming() / outgoing() — convenience accessors with no callers in the codebase.

The base `com.linkedin.hoptimator` package was carrying six visualization types alongside the unrelated core types (Deployer, Source, Sink, etc.). Pulling them into a `graph` subpackage gives them a clean namespace: com.linkedin.hoptimator.graph (api) — data model + SPIs com.linkedin.hoptimator.graph.mermaid (graph) — Mermaid renderer The two modules contribute to different sub-packages of the same root, not the same exact package, so there's no split-package concern. Moved: GraphNode, GraphEdge, PipelineGraph, GraphProvider, GraphRenderer, GraphTarget. Updated imports across consumer modules (cli, util, k8s, jdbc, graph) and renamed the two META-INF/services registration files to match the new fully-qualified names. No behavior change.

ryannedolan · 2026-05-13T01:38:52Z

Can we see example input/output in the PR description? GH will render mermaid!

jogrogan and others added 25 commits May 12, 2026 11:35

update integration test

b345f1c

comment cleanups and add @nullable annotation

4be1a8b

Migrate UID lookup to kind+name

c486664

Fix issue to preserve existing annotations

fa1a5d6

Add more test coverage

8da3062

refactoring

5fbbe41

Allow multiple sources/sinks (future proofing)

64a3794

Fix checkstyle

690e45e

Self review changes

52e2117

Fix up integration tests

324fbb1

jogrogan changed the title ~~Pipeline visualizer: !graph CLI + Mermaid renderer + SPI seam~~ Pluggable graph interface for Hoptimator topology + Mermaid renderer May 12, 2026

Base automatically changed from jogrogan/trigger-dep-guard to main May 13, 2026 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable graph interface for Hoptimator topology + Mermaid renderer#222

Pluggable graph interface for Hoptimator topology + Mermaid renderer#222
jogrogan wants to merge 25 commits into
mainfrom
jogrogan/visualizer

jogrogan commented May 12, 2026

Uh oh!

ryannedolan commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jogrogan commented May 12, 2026

Summary

The graph interface (hoptimator-api)

The dispatcher (hoptimator-util)

The K8s provider (hoptimator-k8s)

The Mermaid renderer (hoptimator-graph)

Test plan

Known limitations

Uh oh!

ryannedolan commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants