Skip to content

Reframe post opening: several years of Iceberg, last year at data-heavy org#71

Merged
itamarwe merged 3 commits into
masterfrom
claude/iceberg-post-opening-tweak
Jun 28, 2026
Merged

Reframe post opening: several years of Iceberg, last year at data-heavy org#71
itamarwe merged 3 commits into
masterfrom
claude/iceberg-post-opening-tweak

Conversation

@itamarwe

Copy link
Copy Markdown
Owner

One-line opening edit on the Iceberg Optimizer Skill post.

Before:

I've spent the last couple of years working closely with one of the most data-intensive organizations in Israel, deploying Apache Iceberg at scale.

After:

I've been working with Apache Iceberg for several years. In the last year I've been helping one of the most data-heavy organizations in Israel deploy it at scale — petabytes of data, multiple ingestion pipelines, constant schema evolution, several query engines reading the same tables.

This separates the broader experience claim (several years with Iceberg) from the specific recent engagement (last year, one org), which is more accurate and sets up the post's authority more clearly.

🤖 Generated with Claude Code

https://claude.ai/code/session_014sy3CvoMeptEkgif3MM7Jh


Generated by Claude Code

…ta-heavy org

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
itamarwe-github-io Ready Ready Preview, Comment Jun 28, 2026 2:57pm

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We? claude and yourself?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'the wrong choices interact' - rephrase pls

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this on any catalog? if not, specify.


**Phase 5 — Plan.** The winning scenario becomes a concrete plan: engine-specific commands with exact parameters, an execution order (ingestion tuning first, then layout, then maintenance), a schedule cadence, and monitoring thresholds that tell you when the next optimization cycle is due.

The skill handles Spark, Trino, AWS Glue/EMR, Snowflake, and Flink/Kafka Connect — each with engine-specific syntax, because the same compaction operation looks very different across these engines.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps use duckdb as well, good for local work and has new iceberg compatabilities

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great for the upcoming versions.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The benchmark scores each plan with an LLM judge evaluating correctness, specificity, and safety. **All 22 passed with a perfect 5/5 average.**

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit worrying that it passed all tests with flying colours. would check for over/under fit to skill code. did you generate this from a different session / model?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark runner spins a fresh session for each test.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it never connects directly to your warehouse

then how does it fetch the metadata?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, you could pull out the metadata and query log and give it to the model. But you are right that ideally it connects directly to your table, ingestion pipeline and query engine and pulls the metadata independently.

- Rephrase 'wrong choices interact' → 'mistakes don't fail independently'
- Clarify metadata access: you export it yourself; direct connectivity is roadmap
- Note skill is available on GitHub
- We→I (personal blog voice) in benchmarks section
- 5.0/5 → 5/5 per reviewer suggestion
- Add DuckDB + direct connectivity to v0.1 roadmap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Clarify that the skill supports both direct connectivity to the table/pipeline/engine and manual metadata export
- Remove direct connectivity from the "what's missing" list (it's already available)
- Drop DuckDB-specific callout; generalize to "other query engines such as DuckDB"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014sy3CvoMeptEkgif3MM7Jh
@itamarwe itamarwe merged commit 7bc8a01 into master Jun 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants