Skip to content

ORDER BY on aggregation alias not applied to ES terms aggregation #52

Description

@fupelaqu

Description

When a SQL query uses ORDER BY on an aggregation alias (e.g., ORDER BY "Total Spend" DESC), the ordering is not applied to the Elasticsearch terms aggregation. The resulting ES query has no "order" clause in the terms aggregation, so results come back in the default doc_count order.

Failing Query (generated by Apache Superset)

SELECT customer_name AS customer_name, country AS country,
       SUM(total_price) AS "Total Spend", COUNT(*) AS "Orders"
FROM ecommerce
GROUP BY customer_name, country
ORDER BY "Total Spend" DESC
LIMIT 10

Expected ES aggregation

{
  "aggs": {
    "customer_name": {
      "terms": {
        "field": "customer_name",
        "size": 11,
        "min_doc_count": 1,
        "order": { "Total Spend": "desc" }
      },
      "aggs": {
        "country": {
          "terms": {
            "field": "country",
            "size": 11,
            "min_doc_count": 1
          },
          "aggs": {
            "Total Spend": { "sum": { "field": "total_price" } },
            "Orders": { "value_count": { "field": "_index" } }
          }
        }
      }
    }
  }
}

Actual ES aggregation (no "order")

{
  "aggs": {
    "customer_name": {
      "terms": {
        "field": "customer_name",
        "size": 11,
        "min_doc_count": 1
      },
      "aggs": {
        "country": {
          "terms": {
            "field": "country",
            "size": 11,
            "min_doc_count": 1
          },
          "aggs": {
            "Total Spend": { "sum": { "field": "total_price" } },
            "Orders": { "value_count": { "field": "_index" } }
          }
        }
      }
    }
  }
}

Root Cause

In ElasticAggregation.apply() (bridge/.../ElasticAggregation.scala), the direction for an aggregation is resolved by looking up the aggregation's Identifier properties in bucketsDirection (which comes from request.sorts):

val direction =
  bucketsDirection
    .get(identifier.identifierName)      // e.g., "SUM(total_price)" — no match
    .orElse(bucketsDirection.get(identifier.aliasOrName))  // e.g., "total_price" — no match

bucketsDirection contains "Total Spend" -> Desc (from ORDER BY "Total Spend" DESC).

However:

  • identifier.identifierName = "SUM(total_price)" (reconstructed via functions) — no match
  • identifier.aliasOrName = identifier.fieldAlias.getOrElse(identifier.name) = "total_price"no match

The Identifier.fieldAlias is still None because SingleSearch.update() has not been called at this point. The alias "Total Spend" exists only on the Field level (Field.fieldAlias = Some(Alias("Total Spend"))), not on the Identifier level.

So direction = None, and the aggregation has no ordering. Later, in buildBuckets, aggregationsDirection is empty because no aggregation reported a direction.

Fix

In ElasticAggregation.apply(), also look up the Field-level alias in bucketsDirection:

val direction =
  bucketsDirection
    .get(identifier.identifierName)
    .orElse(bucketsDirection.get(identifier.aliasOrName))
    .orElse(fieldAlias.flatMap(a => bucketsDirection.get(a.alias)))  // NEW: check Field.fieldAlias

Since import sqlAgg._ brings Field members into scope, fieldAlias is Field.fieldAlias = Some(Alias("Total Spend")). This correctly resolves bucketsDirection.get("Total Spend") = Some(Desc).

Files to modify

File Change
bridge/.../ElasticAggregation.scala Add .orElse(fieldAlias.flatMap(a => bucketsDirection.get(a.alias))) to direction resolution

Tests

  • Integration test: Query with ORDER BY "Total Spend" DESC should produce ES aggregation with "order": {"Total Spend": "desc"} on the terms bucket
  • Parser + bridge test: Verify direction is Some(Desc) for aliased aggregation with ORDER BY on alias

Additional Problem — ORDER BY aggregation not in SELECT

A related gap exists: if an aggregation appears only in ORDER BY (not in SELECT), the aggregation is never created in the Elasticsearch query.

SELECT profile.city
FROM dql_users
GROUP BY profile.city
ORDER BY COUNT(*) DESC

Here COUNT(*) is only in ORDER BY. SingleSearch.aggregates does not extract aggregations from ORDER BY (only from SELECT, HAVING, and WHERE since issue #53). The aggregation is never passed to the bridge layer, so the terms aggregation cannot reference it in its "order" clause.

Suggested approach

  1. Add extractAggregationFields to FieldSort (similar to Criteria.extractAggregationFields added in Aggregations referenced only in HAVING or WHERE are not created #53)
  2. Extend SingleSearch.aggregates to also collect ORDER BY aggregations
  3. Combine with the alias resolution fix above so the full ORDER BY + aggregation pipeline works end-to-end

Key observations

  • FieldSort in OrderBy.scala already has hasAggregation and isBucketScript but no extractAggregationFields
  • ORDER BY aggregation sorting uses a different mechanism ("order" on the terms agg) than HAVING (bucket_selector)
  • Both sub-problems (alias resolution + missing aggregation creation) should be fixed together

Impact

  • Superset dashboards: "Top Customers by Spend" chart returns unsorted data
  • Any JDBC/Arrow client using ORDER BY <agg_alias> gets wrong ordering
  • The ES aggregation size (from LIMIT) is correct, but without proper ordering, the top-N results are meaningless

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions