Skip to content

Orthogonal treatment of groups (nee Window functions) #300

@max-sixty

Description

@max-sixty

Edit — this issue evolved into a much broader discussion of how we do groups. The original message is below, and I've tried to summarize the current state towards the end of the issue so it's easier for others to engage.


More than half the people I speak with about PRQL tell me that window functions are painful in SQL and would be a great feature to add to PRQL. I put window funcitons in the examples in the Readme at first, since I had the same experience in SQL.

For clarity: a window / analytic / analytical function is a function that takes other rows as an input, but instead of aggregating over many rows, it creates a new value for every row. For example, a moving average or lag. In SQL it's represented by the OVER keyword.

I haven't thought about this enough, but wanted to post an issue so we could start thinking about how to do it. A couple of initial options:

  • A separate pipeline, with the column at the end, and the function (lag in this case) in the pipeline:
from prices
derive price_yesterday: (
  window
  by sec_id
  sort date
  lag 1
  price
)
  • A function with lots of named args, with the function lag as a named arg (but how would this work for functions with 0 or >1 parameters?):
from prices
derive price_yesterday: (window by:sec_id sort:date lag:1 price)
  • Eliminate window — because we have aggregate, we can't get confused about sum(foo) OVER and sum(foo), so possibly we don't need to have window:
from prices
derive price_yesterday: (price | lag rows:1 sort:date by:sec_id)

I don't think this is sufficiently thought through, and I'm sure these examples have inadequacies.

Here are some example queries we can use as cases: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts#get_the_most_popular_item_in_each_category

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions