Skip to content

[Bug] MV refresh drops Iceberg VERSION AS OF semantics and refreshes latest snapshot #71968

@HangyuanLiu

Description

@HangyuanLiu

Bug summary

CREATE MATERIALIZED VIEW ... AS SELECT ... FROM <iceberg_table> VERSION AS OF <snapshot_id> is accepted, and SHOW CREATE MATERIALIZED VIEW preserves the VERSION AS OF clause.

However, REFRESH MATERIALIZED VIEW ... WITH SYNC MODE does not refresh from that pinned snapshot. It refreshes from the latest Iceberg snapshot instead.

This creates a silent semantic mismatch:

  • the MV definition shown to the user still contains VERSION AS OF <snapshot_id>
  • but the refresh task actually reads the latest snapshot

Reproduction

create database repro_ice.tt_mv_bug;
create table repro_ice.tt_mv_bug.t1 (
  id int,
  dt date,
  val int
) partition by (dt);

insert into repro_ice.tt_mv_bug.t1 values (1, '2024-01-01', 10);
-- capture snapshot A here

insert into repro_ice.tt_mv_bug.t1 values (2, '2024-01-02', 20);

create database mv_bug_db;
use mv_bug_db;

create materialized view test_mv_snap
refresh deferred manual
properties ("replication_num" = "1")
as
select dt, sum(val) as sv
from repro_ice.tt_mv_bug.t1 version as of <snapshot_A>
group by dt;

refresh materialized view test_mv_snap with sync mode;
select dt, sv from test_mv_snap order by dt;
show create materialized view test_mv_snap;

Expected behavior

If the MV definition contains VERSION AS OF <snapshot_A>, refresh should read from that snapshot and return:

2024-01-01  10

Actual behavior

Refresh returns data from the latest snapshot instead:

2024-01-01  10
2024-01-02  20

At the same time, SHOW CREATE MATERIALIZED VIEW still shows the original definition with VERSION AS OF <snapshot_A>.

Code-level finding

This appears to be caused by refresh using a different SQL representation from the one shown by SHOW CREATE:

  • SHOW CREATE MATERIALIZED VIEW prefers originalViewDefineSql
  • MV refresh uses MaterializedView.getTaskDefinition(), which is built from getMVQueryDefinedSql()
  • getMVQueryDefinedSql() uses viewDefineSql / simpleViewDef
  • simpleViewDef is generated from AstToSQLBuilder.buildSimple(queryStatement)
  • AST2SQLVisitor.visitTable() does not serialize TableRelation.queryPeriod

So the refresh task SQL loses the VERSION AS OF clause before it is reparsed for execution.

Impact

This is a silent correctness/semantics bug:

  • users believe they created a snapshot-pinned MV
  • refresh actually tracks the latest base snapshot
  • SHOW CREATE is misleading because it preserves the original clause

Suggested fix direction

At minimum, MV refresh should preserve time-travel clauses in its executable definition.

Possible options:

  1. Serialize TableRelation.queryPeriod in AST2SQLVisitor.visitTable() so viewDefineSql / task definition keep VERSION AS OF
  2. Or make refresh build its executable AST from the original define SQL / parsed define query instead of the lossy simplified SQL

Additional note

During debugging, direct Iceberg queries with VERSION AS OF <snapshot_id> worked correctly when MV rewrite was disabled, which suggests the core Iceberg time-travel scan path is working and the bug is in MV definition / refresh SQL construction.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions