Bug summary
CREATE MATERIALIZED VIEW ... AS SELECT ... FROM <iceberg_table> VERSION AS OF <snapshot_id> is accepted, and SHOW CREATE MATERIALIZED VIEW preserves the VERSION AS OF clause.
However, REFRESH MATERIALIZED VIEW ... WITH SYNC MODE does not refresh from that pinned snapshot. It refreshes from the latest Iceberg snapshot instead.
This creates a silent semantic mismatch:
- the MV definition shown to the user still contains
VERSION AS OF <snapshot_id>
- but the refresh task actually reads the latest snapshot
Reproduction
create database repro_ice.tt_mv_bug;
create table repro_ice.tt_mv_bug.t1 (
id int,
dt date,
val int
) partition by (dt);
insert into repro_ice.tt_mv_bug.t1 values (1, '2024-01-01', 10);
-- capture snapshot A here
insert into repro_ice.tt_mv_bug.t1 values (2, '2024-01-02', 20);
create database mv_bug_db;
use mv_bug_db;
create materialized view test_mv_snap
refresh deferred manual
properties ("replication_num" = "1")
as
select dt, sum(val) as sv
from repro_ice.tt_mv_bug.t1 version as of <snapshot_A>
group by dt;
refresh materialized view test_mv_snap with sync mode;
select dt, sv from test_mv_snap order by dt;
show create materialized view test_mv_snap;
Expected behavior
If the MV definition contains VERSION AS OF <snapshot_A>, refresh should read from that snapshot and return:
Actual behavior
Refresh returns data from the latest snapshot instead:
2024-01-01 10
2024-01-02 20
At the same time, SHOW CREATE MATERIALIZED VIEW still shows the original definition with VERSION AS OF <snapshot_A>.
Code-level finding
This appears to be caused by refresh using a different SQL representation from the one shown by SHOW CREATE:
SHOW CREATE MATERIALIZED VIEW prefers originalViewDefineSql
- MV refresh uses
MaterializedView.getTaskDefinition(), which is built from getMVQueryDefinedSql()
getMVQueryDefinedSql() uses viewDefineSql / simpleViewDef
simpleViewDef is generated from AstToSQLBuilder.buildSimple(queryStatement)
AST2SQLVisitor.visitTable() does not serialize TableRelation.queryPeriod
So the refresh task SQL loses the VERSION AS OF clause before it is reparsed for execution.
Impact
This is a silent correctness/semantics bug:
- users believe they created a snapshot-pinned MV
- refresh actually tracks the latest base snapshot
SHOW CREATE is misleading because it preserves the original clause
Suggested fix direction
At minimum, MV refresh should preserve time-travel clauses in its executable definition.
Possible options:
- Serialize
TableRelation.queryPeriod in AST2SQLVisitor.visitTable() so viewDefineSql / task definition keep VERSION AS OF
- Or make refresh build its executable AST from the original define SQL / parsed define query instead of the lossy simplified SQL
Additional note
During debugging, direct Iceberg queries with VERSION AS OF <snapshot_id> worked correctly when MV rewrite was disabled, which suggests the core Iceberg time-travel scan path is working and the bug is in MV definition / refresh SQL construction.
Bug summary
CREATE MATERIALIZED VIEW ... AS SELECT ... FROM <iceberg_table> VERSION AS OF <snapshot_id>is accepted, andSHOW CREATE MATERIALIZED VIEWpreserves theVERSION AS OFclause.However,
REFRESH MATERIALIZED VIEW ... WITH SYNC MODEdoes not refresh from that pinned snapshot. It refreshes from the latest Iceberg snapshot instead.This creates a silent semantic mismatch:
VERSION AS OF <snapshot_id>Reproduction
Expected behavior
If the MV definition contains
VERSION AS OF <snapshot_A>, refresh should read from that snapshot and return:Actual behavior
Refresh returns data from the latest snapshot instead:
At the same time,
SHOW CREATE MATERIALIZED VIEWstill shows the original definition withVERSION AS OF <snapshot_A>.Code-level finding
This appears to be caused by refresh using a different SQL representation from the one shown by
SHOW CREATE:SHOW CREATE MATERIALIZED VIEWprefersoriginalViewDefineSqlMaterializedView.getTaskDefinition(), which is built fromgetMVQueryDefinedSql()getMVQueryDefinedSql()usesviewDefineSql/simpleViewDefsimpleViewDefis generated fromAstToSQLBuilder.buildSimple(queryStatement)AST2SQLVisitor.visitTable()does not serializeTableRelation.queryPeriodSo the refresh task SQL loses the
VERSION AS OFclause before it is reparsed for execution.Impact
This is a silent correctness/semantics bug:
SHOW CREATEis misleading because it preserves the original clauseSuggested fix direction
At minimum, MV refresh should preserve time-travel clauses in its executable definition.
Possible options:
TableRelation.queryPeriodinAST2SQLVisitor.visitTable()soviewDefineSql/ task definition keepVERSION AS OFAdditional note
During debugging, direct Iceberg queries with
VERSION AS OF <snapshot_id>worked correctly when MV rewrite was disabled, which suggests the core Iceberg time-travel scan path is working and the bug is in MV definition / refresh SQL construction.