Edit — this issue evolved into a much broader discussion of how we do groups. The original message is below, and I've tried to summarize the current state towards the end of the issue so it's easier for others to engage.
More than half the people I speak with about PRQL tell me that window functions are painful in SQL and would be a great feature to add to PRQL. I put window funcitons in the examples in the Readme at first, since I had the same experience in SQL.
For clarity: a window / analytic / analytical function is a function that takes other rows as an input, but instead of aggregating over many rows, it creates a new value for every row. For example, a moving average or lag. In SQL it's represented by the OVER keyword.
I haven't thought about this enough, but wanted to post an issue so we could start thinking about how to do it. A couple of initial options:
- A separate pipeline, with the column at the end, and the function (
lag in this case) in the pipeline:
from prices
derive price_yesterday: (
window
by sec_id
sort date
lag 1
price
)
- A function with lots of named args, with the function
lag as a named arg (but how would this work for functions with 0 or >1 parameters?):
from prices
derive price_yesterday: (window by:sec_id sort:date lag:1 price)
- Eliminate
window — because we have aggregate, we can't get confused about sum(foo) OVER and sum(foo), so possibly we don't need to have window:
from prices
derive price_yesterday: (price | lag rows:1 sort:date by:sec_id)
I don't think this is sufficiently thought through, and I'm sure these examples have inadequacies.
Here are some example queries we can use as cases: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts#get_the_most_popular_item_in_each_category
Edit — this issue evolved into a much broader discussion of how we do groups. The original message is below, and I've tried to summarize the current state towards the end of the issue so it's easier for others to engage.
More than half the people I speak with about PRQL tell me that window functions are painful in SQL and would be a great feature to add to PRQL. I put window funcitons in the examples in the Readme at first, since I had the same experience in SQL.
For clarity: a window / analytic / analytical function is a function that takes other rows as an input, but instead of aggregating over many rows, it creates a new value for every row. For example, a moving average or lag. In SQL it's represented by the
OVERkeyword.I haven't thought about this enough, but wanted to post an issue so we could start thinking about how to do it. A couple of initial options:
lagin this case) in the pipeline:lagas a named arg (but how would this work for functions with 0 or >1 parameters?):window— because we haveaggregate, we can't get confused aboutsum(foo) OVERandsum(foo), so possibly we don't need to havewindow:I don't think this is sufficiently thought through, and I'm sure these examples have inadequacies.
Here are some example queries we can use as cases: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts#get_the_most_popular_item_in_each_category