chore: Setup project layout by Xuanwo · Pull Request #1 · apache/iceberg-rust

Xuanwo · 2023-07-21T06:41:28Z

This PR will setup the basic project layout.

Please use sqaush merge.

Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo · 2023-07-21T06:45:03Z

cc @JanKaul, @liurenjie1024 for review.

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024

Others LGTM

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024

Thanks!

Signed-off-by: Xuanwo <github@xuanwo.io>

fix connection pool issue for sql catalog

* try publish * try * fix * use make * try docker up * machine exec * shit

… not REE The RecordBatchTransformer back-fills identity-partition columns missing from a data file (Iceberg column-projection rule apache#1, commit 663d7a3) and materialized those constants as RunEndEncoded "for efficiency". But files that physically carry the column emit the flat type via PassThrough, so the per-file output schema diverged: `Utf8` for present files vs `RunEndEncoded(Utf8)` for back-filled ones. When a DataFusion compaction scans a mix of both (a table with schema evolution / partial column presence), it concatenates batches across files and panics: Arrow error: It is not possible to concatenate arrays of different data types (Utf8, RunEndEncoded("run_ends": Int32, "values": Utf8)) Observed live on golden `dev_golden_ingest_e1_search_events_cisco_asa` (plan-46 Task 7): the first real `dryRun=false` compaction failed before commit (no data harm). The existing REE->flat decodes (ad707b3, 7134847) only run at the WRITE boundary, downstream of this read/merge concat, so they never see it. Fix: an identity-partition constant exists in the table schema, so emit it in the canonical schema type (e.g. `Utf8`) for both the output schema field and the `ColumnSource::Add` op. Back-filled and read-through files then share an identical output schema and concat cleanly. Virtual/metadata constants (`_file`, never in the schema, always back-filled -> already self-consistent) keep RunEndEncoded, which is a genuine memory win there. Tests: new `test_partition_constant_schema_matches_passthrough` asserts a back-filled file and a read-through file produce identical output schemas (the precondition `concat_batches` enforces); updated `test_virtual_partition_column_uses_manifest_value` to assert flat `Utf8`. All 19 record_batch_transformer tests pass.

Data files lacking embedded Iceberg field ids were assigned field ids positionally (physical column N -> field-id N+1) and projected positionally. That is only correct when a file's physical column order matches field-id order. A schema-evolving writer that omits a column mid-schema and appends later ones violates it: on golden cisco_asa, files omit product_name (field-id 22) and append auguria_event_timestamp, so physical slot 21 (where field-id 22 falls positionally) holds auguria_event_timestamp -- epoch-millis were served as the product_name identity-partition value. Lossless but silently corrupt partitioning (compaction split one table into ~9,787 bogus product_name=<epoch-millis> partitions). Plan-46 Task 14. Fix (crates/iceberg/src/arrow/reader.rs): - assign_field_ids_by_name: for files without embedded ids and without an explicit schema.name-mapping.default, assign ids by NAME from the task schema, recursing through struct/list/map. Replaces the positional add_fallback_field_ids_to_arrow_schema (removed). - Projection is now always field-id-based: ids are present either embedded (Branch 1) or name-assigned (Branch 2/3), so the position-based projection mask is no longer used in the read path. - with_partition back-fill present-set is derived from the name-mapped arrow schema when ids aren't embedded, so omitted identity-partition columns stay eligible for manifest back-fill (rule apache#1) instead of being falsely reported present at an appended column's slot. - New regression test test_read_parquet_without_field_ids_omitted_identity_partition_backfills_from_manifest. Name-based resolution degrades to the same result as positional when order does match field-id order, so nothing is lost. All 81 arrow:: unit tests pass; clippy clean. Validated end-to-end on golden cisco_asa (465->1, lossless 1,717,975, product_name=cisco_asa) and cisco_asa_firewall (464->1, lossless 1,710,524, product_name=cisco_asa_firewall).

Xuanwo added 3 commits July 21, 2023 14:40

chore: Setup project layout

a073dcb

Signed-off-by: Xuanwo <github@xuanwo.io>

Update

314c996

Signed-off-by: Xuanwo <github@xuanwo.io>

Update

c720845

Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo added 2 commits July 21, 2023 14:50

Add readme

4664afb

Signed-off-by: Xuanwo <github@xuanwo.io>

fix typo

53446a5

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024 reviewed Jul 21, 2023

View reviewed changes

Comment thread .asf.yaml

Add protected_branches

030cf64

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024 approved these changes Jul 21, 2023

View reviewed changes

Fokko approved these changes Jul 21, 2023

View reviewed changes

Comment thread .asf.yaml

Comment thread .github/dependabot.yml Outdated

Use weekly instead

ef0f988

Signed-off-by: Xuanwo <github@xuanwo.io>

Fokko merged commit bd435b2 into apache:main Jul 21, 2023

Xuanwo deleted the setup branch July 21, 2023 08:47

himadripal pushed a commit to himadripal/iceberg-rust that referenced this pull request Apr 17, 2024

Merge pull request apache#1 from himadripal/sql-catalog-conn-pool-fix

f98fde3

fix connection pool issue for sql catalog

hareshkh pushed a commit to hareshkh/iceberg-rust that referenced this pull request Feb 17, 2026

try publish (apache#1)

a36c98e

* try publish * try * fix * use make * try docker up * machine exec * shit

mbutrovich mentioned this pull request Mar 11, 2026

Tracking Issue of Iceberg Rust 0.9 Release #2213

Closed

15 tasks

mbutrovich mentioned this pull request May 5, 2026

feat(encryption) [5/N] Support encryption: Encryption Manager #2383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Setup project layout#1

chore: Setup project layout#1
Fokko merged 7 commits into
apache:mainfrom
Xuanwo:setup

Xuanwo commented Jul 21, 2023 •

edited

Loading

Uh oh!

Xuanwo commented Jul 21, 2023

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Xuanwo commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xuanwo commented Jul 21, 2023

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xuanwo commented Jul 21, 2023 •

edited

Loading