Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions content/posts/2026-06-26-apache-iceberg-optimization-skill.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ image: /img/iceberg-optimizer/social.png

![Claude Code and Apache Iceberg icons β€” the iceberg optimization skill codifies deployment patterns from large-scale real-world usage.](/img/iceberg-optimizer/social.png)

I've spent the last couple of years working closely with one of the most data-intensive organizations in Israel, deploying Apache Iceberg at scale. Petabytes of data, multiple ingestion pipelines, constant schema evolution, several query engines reading the same tables. Exactly the kind of environment that stress-tests every assumption you had about the format.
I've been working with Apache Iceberg for several years. In the last year I've been helping one of the most data-heavy organizations in Israel deploy it at scale β€” petabytes of data, multiple ingestion pipelines, constant schema evolution, several query engines reading the same tables. Exactly the kind of environment that stress-tests every assumption you had about the format.

Most of the problems weren't about scale. They were about **not knowing what Iceberg is actually doing under the hood**.

Expand All @@ -18,7 +18,7 @@ Here's how it goes. A team discovers Iceberg, reads the getting-started docs, wi

The root cause is almost always the same: **Iceberg has a lot of knobs, and the defaults were chosen for correctness, not for your workload**. Partition specs, sort orders, file size targets, snapshot retention policies, manifest sizing, V1 vs V2 table format β€” most teams never touch any of them. They accept the defaults, the problems accumulate invisibly, and the first sign anything is wrong is a data engineer firefighting at 2am.

What makes it worse is that the wrong choices interact. A poor partition strategy amplifies the cost of unoptimized file sizes. Unbounded snapshot accumulation slows partition pruning. Too many small files and the wrong delete mode turn a trivially fast CDC table into a read-time disaster. The problems compound before they're visible.
The deeper problem is that these mistakes don't fail independently β€” they stack. A poor partition strategy amplifies the cost of unoptimized file sizes. Unbounded snapshot accumulation slows partition pruning. Too many small files and the wrong delete mode turn a trivially fast CDC table into a read-time disaster. Each bad choice makes the next one worse, and the compounding is invisible until it isn't.

## What I kept doing over and over

Expand All @@ -28,9 +28,9 @@ This is specialized knowledge. It took time to build, and it isn't well-document

## The skill

I codified everything I kept doing into a **[Claude Code skill](https://github.com/itamarwe/iceberg-optimizer-skill)** β€” a reusable, promptable assistant that knows the bits and bytes of Iceberg and guides you through the decisions that actually matter for your workload.
I codified everything I kept doing into a **[Claude Code skill](https://github.com/itamarwe/iceberg-optimizer-skill)** (available on GitHub) β€” a reusable, promptable assistant that knows the bits and bytes of Iceberg and guides you through the decisions that actually matter for your workload.

The design principle is: *observe before you ask, ask before you decide, simulate before you recommend*. Rather than firing generic best-practice advice, the skill runs a structured diagnostic before it tells you anything. It works by operating on exported metadata tables and query logs β€” it never connects directly to your warehouse β€” and it stays read-only until you explicitly approve Phase 5's commands.
The design principle is: *observe before you ask, ask before you decide, simulate before you recommend*. Rather than firing generic best-practice advice, the skill runs a structured diagnostic before it tells you anything. It can connect directly to your table, ingestion pipeline, and query engine β€” or, if you prefer, you export the metadata yourself, paste back the output, or supply pre-exported files. Either way, it stays read-only until you explicitly approve Phase 5's commands.

## The six-phase flow

Expand All @@ -52,7 +52,7 @@ The skill handles Spark, Trino, AWS Glue/EMR, Snowflake, and Flink/Kafka Connect

## Benchmarks

Any optimization advisor is only as useful as its ability to handle the edge cases β€” the failure modes that only show up in production, under specific combinations of write pattern, engine, and table shape. We benchmarked the skill against 22 scenarios built from real failure patterns.
Any optimization advisor is only as useful as its ability to handle the edge cases β€” the failure modes that only show up in production, under specific combinations of write pattern, engine, and table shape. I benchmarked the skill against 22 scenarios built from real failure patterns.

![22 benchmark scenarios across 7 failure-mode categories. Every scenario is a distinct real-world failure pattern β€” no duplicates, no synthetic toy tables.](/img/iceberg-optimizer/benchmark_coverage.png)

Expand All @@ -66,15 +66,15 @@ The scenarios cover seven categories of failure:
- **Indexes** β€” bloom filters on the wrong columns (low-cardinality or range-queried columns where min/max statistics already do the job), and Z-ordering over too many columns, which reduces locality rather than improving it.
- **Cost & Lifecycle** β€” cold archives where the compute cost of maintenance exceeds any query savings, and the query-cost vs maintenance-cost tradeoff where the right answer is to do less, not more.

The benchmark scores each plan with an LLM judge evaluating correctness, specificity, and safety. **All 22 passed with a perfect 5.0/5 average.**
The benchmark scores each plan with an LLM judge evaluating correctness, specificity, and safety. **All 22 passed with a perfect 5/5 average.**

Two things are worth noting about the benchmark design. First, every scenario is a distinct failure pattern β€” we didn't generate synthetic variations of the same problem. Second, the score checks not just whether the skill recommends the right action, but whether it recommends it *for the right reason* and with the right caveats. A correct answer for the wrong reason scores lower.
Two things are worth noting about the benchmark design. First, every scenario is a distinct failure pattern β€” I didn't generate synthetic variations of the same problem. Second, the score checks not just whether the skill recommends the right action, but whether it recommends it *for the right reason* and with the right caveats. A correct answer for the wrong reason scores lower.

## This is v0.1

All five engines are supported. The 22 failure modes above are covered. Twenty-nine unit tests pass across the profiler and query-log parser.

What's missing: deeper multi-engine write coordination, large-scale migration scenarios (Hudi-to-Iceberg, Delta-to-Iceberg), Z-ordering tradeoffs at very high cardinalities, and more efficient token usage as the prompt structure matures. **This is a starting point**, not a complete reference.
What's missing: support for other query engines such as DuckDB, deeper multi-engine write coordination, large-scale migration scenarios (Hudi-to-Iceberg, Delta-to-Iceberg), Z-ordering tradeoffs at very high cardinalities, and more efficient token usage as the prompt structure matures. **This is a starting point**, not a complete reference.

As the skill gets used on more real deployments, the patterns will sharpen and coverage will expand.

Expand Down
Loading