Describe the bug
The results for a UNION ALL that specifies an ORDER BY is sometimes sorted incorrectly.
Note
I bisected to the first bad commit being 5fc91cc via #5661
To Reproduce
Using datafusion-cli (version 22.0) execute;
WITH
m0(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 10.1), (1667181610000000000, 'val00', 21.2), (1667181620000000000, 'val00', 11.2), (1667181630000000000, 'val00', 19.2), (1667181600000000000, 'val01', 11.3), (1667181600000000000, 'val02', 10.4), (1667181610000000000, 'val00', 18.9)),
m1(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 100.5), (1667181610000000000, 'val00', 200.6), (1667181600000000000, 'val01', 101.7)),
t AS (
SELECT 'm0' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m0 GROUP BY 1, 2, 3),
u AS (
SELECT 'm1' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m1 GROUP BY 1, 2, 3)
SELECT * FROM t
UNION ALL
SELECT * FROM u
ORDER BY 1, 2, 3;
Expected behavior
Output should be sorted on the first 3 columns:
+------------------+-------+---------------------+------------+----------+-------------------+
| iox::measurement | tag0 | time | COUNT(f64) | SUM(f64) | STDDEV(f64) |
+------------------+-------+---------------------+------------+----------+-------------------+
| m0 | val00 | 1970-01-01T00:00:00 | 5 | 80.6 | 5.085961069453836 |
| m0 | val01 | 1970-01-01T00:00:00 | 1 | 11.3 | |
| m0 | val02 | 1970-01-01T00:00:00 | 1 | 10.4 | |
| m1 | val00 | 1970-01-01T00:00:00 | 2 | 301.1 | 70.7813887967734 |
| m1 | val01 | 1970-01-01T00:00:00 | 1 | 101.7 | |
+------------------+-------+---------------------+------------+----------+-------------------+
Actual behavior
Output is sometimes inconsistent, noting that the sort order is incorrect:
+------------------+-------+---------------------+------------+----------+-------------------+
| iox::measurement | tag0 | time | COUNT(f64) | SUM(f64) | STDDEV(f64) |
+------------------+-------+---------------------+------------+----------+-------------------+
| m0 | val00 | 1970-01-01T00:00:00 | 5 | 80.6 | 5.085961069453836 |
| m0 | val01 | 1970-01-01T00:00:00 | 1 | 11.3 | |
| m1 | val00 | 1970-01-01T00:00:00 | 2 | 301.1 | 70.7813887967734 |
| m0 | val02 | 1970-01-01T00:00:00 | 1 | 10.4 | |
| m1 | val01 | 1970-01-01T00:00:00 | 1 | 101.7 | |
+------------------+-------+---------------------+------------+----------+-------------------+
Additional context
DataFusion versions 18, 19, 20 and 21 work consistently after numerous executions of the same query
The following also works on 22
WITH
m0(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 10.1), (1667181610000000000, 'val00', 21.2), (1667181620000000000, 'val00', 11.2), (1667181630000000000, 'val00', 19.2), (1667181600000000000, 'val01', 11.3), (1667181600000000000, 'val02', 10.4), (1667181610000000000, 'val00', 18.9)),
m1(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 100.5), (1667181610000000000, 'val00', 200.6), (1667181600000000000, 'val01', 101.7)),
t AS (SELECT 'm0' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m0 GROUP BY 1, 2, 3),
u AS (SELECT 'm1' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m1 GROUP BY 1, 2, 3),
-- add another CTE
v AS (
SELECT * FROM t
UNION ALL
SELECT * FROM u)
SELECT * FROM v
ORDER BY 1, 2, 3;
Describe the bug
The results for a
UNION ALLthat specifies anORDER BYis sometimes sorted incorrectly.To Reproduce
Using
datafusion-cli(version 22.0) execute;Expected behavior
Output should be sorted on the first 3 columns:
Actual behavior
Output is sometimes inconsistent, noting that the sort order is incorrect:
Additional context
DataFusion versions 18, 19, 20 and 21 work consistently after numerous executions of the same query
The following also works on 22