Addition of a partition function in datalake V1 tables

### Why this Issue is created?

In my work I've had requirements to create SQLs that run against AWS Glue Data Catalog Tables where they need to get data from the latest partition only.

Something like :-

**SELECT * FROM db.table WHERE partition_col = (SELECT max(partition_col) FROM db.table)**

But the issue with above approach is I've seen that pushdown predicate doesn't always read the metadata only and reads the data itself which shouldn't happen.

In Athena, to solve this the query can be written as :-

**SELECT * FROM db.table WHERE partition_col = (SELECT max(partition_col) FROM "db"."table$partitions")**

But there isn't a similar approach in Spark for V1 tables and the only way is using SHOW PARTITIONS but it doesn't work as a subquery.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Addition of a partition function in datalake V1 tables #56833

Why this Issue is created?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Addition of a partition function in datalake V1 tables #56833

Description

Why this Issue is created?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions