Skip to content

Addition of a partition function in datalake V1 tables #56833

Description

@0xAJX

Why this Issue is created?

In my work I've had requirements to create SQLs that run against AWS Glue Data Catalog Tables where they need to get data from the latest partition only.

Something like :-

SELECT * FROM db.table WHERE partition_col = (SELECT max(partition_col) FROM db.table)

But the issue with above approach is I've seen that pushdown predicate doesn't always read the metadata only and reads the data itself which shouldn't happen.

In Athena, to solve this the query can be written as :-

SELECT * FROM db.table WHERE partition_col = (SELECT max(partition_col) FROM "db"."table$partitions")

But there isn't a similar approach in Spark for V1 tables and the only way is using SHOW PARTITIONS but it doesn't work as a subquery.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions