Repro is with super commit a1d4c36.
The following query finds the top 5 GitHub users most frequently assigned as a PR reviewer. It seems to run slower than we expect. It takes just over 1-second on an AWS EC2 m6idn.xlarge, and here's the current perf on my Intel-baed Macbook:
$ aws s3 cp s3://brim-sampledata/gha/ghasample.json.gz . &&
gunzip ghasample.json.gz &&
super -version &&
super -dynamic -vam -i fjson -f csup -o ghasample.csup ghasample.json &&
time super -dynamic -vam -i csup -f csv -c "
FROM 'ghasample.csup'
| UNNEST [...payload.pull_request.assignees, payload.pull_request.assignee]
| WHERE this IS NOT NULL
| AGGREGATE count() BY assignee:=login
| ORDER BY count DESC, assignee ASC
| LIMIT 5;"
Version: v0.3.0-218-ga1d4c36bd
assignee,count
poad,42
victor-eds,30
ahasanzadeh13,26
skupr-anaconda,24
streamer45,24
real 0m0.969s
user 0m1.421s
sys 0m0.105s
There's a write-up that indicates certain parts of the unnest implementation may be contributing to the slowness.
Repro is with super commit a1d4c36.
The following query finds the top 5 GitHub users most frequently assigned as a PR reviewer. It seems to run slower than we expect. It takes just over 1-second on an AWS EC2
m6idn.xlarge, and here's the current perf on my Intel-baed Macbook:There's a write-up that indicates certain parts of the
unnestimplementation may be contributing to the slowness.