Skip to content

Commit e41eff2

Browse files
Blizzarajacques-n
andauthored
feat: add precision to IntervalDay and new IntervalCompound type (#665)
* Update IntervalDay to support multiple subsecond precisions (0-9). This is done in a way that should be effectively compatible with old systems. Old plans should be able to be continue to be consumed treating the old values as microsecond precision. * Add new IntervalCompound type which is a combination of IntervalMonth and IntervalDay BREAKING CHANGE: The encoding of IntervalDay literals has changed in a strictly backwards incompatible way. However, the logical meaning across encoding is maintained using a oneof. Moving a field into a oneof makes unset/set to zero unclear with older messages but the fields are defined such that the logical meaning of the two is indistinct. If neither microseconds nor precision is set, the value can be considered a precision 6 value. If you aren't using IntervalDay type, you will not need to make any changes. BREAKING CHANGE: TypeExpression and Parameterized type protobufs (used to serialize output derivation) are updated to match the now compound nature of IntervalDay. If you use protobuf to serialize output derivation that refer to IntervalDay type, you will need to rework that logic. fixes #664 --------- Co-authored-by: Jacques Nadeau <jacques@apache.org>
1 parent bed84ec commit e41eff2

File tree

6 files changed

+62
-4
lines changed

6 files changed

+62
-4
lines changed

proto/substrait/algebra.proto

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -817,6 +817,7 @@ message Expression {
817817
int64 time = 17;
818818
IntervalYearToMonth interval_year_to_month = 19;
819819
IntervalDayToSecond interval_day_to_second = 20;
820+
IntervalCompound interval_compound = 36;
820821
string fixed_char = 21;
821822
VarChar var_char = 22;
822823
bytes fixed_binary = 23;
@@ -888,7 +889,21 @@ message Expression {
888889
message IntervalDayToSecond {
889890
int32 days = 1;
890891
int32 seconds = 2;
891-
int32 microseconds = 3;
892+
893+
// Consumers should expect either (miroseconds) to be set or (precision and subseconds) to be set
894+
oneof precision_mode {
895+
int32 microseconds = 3 [deprecated = true]; // use precision and subseconds below, they cover and replace microseconds.
896+
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microseconds, 9 is nanoseconds. Should be used with subseconds below.
897+
int32 precision = 4;
898+
}
899+
900+
// the number of fractional seconds using 1e(-precision) units. Should only be used with precision field, not microseconds.
901+
int64 subseconds = 5;
902+
}
903+
904+
message IntervalCompound {
905+
IntervalYearToMonth interval_year_to_month = 1;
906+
IntervalDayToSecond interval_day_to_second = 2;
892907
}
893908

894909
message Struct {

proto/substrait/parameterized_types.proto

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ message ParameterizedType {
2626
Type.Date date = 16;
2727
Type.Time time = 17;
2828
Type.IntervalYear interval_year = 19;
29-
Type.IntervalDay interval_day = 20;
29+
ParameterizedIntervalDay interval_day = 20;
30+
ParameterizedIntervalCompound interval_compound = 36;
3031
// Deprecated in favor of `ParameterizedPrecisionTimestampTZ precision_timestamp_tz`
3132
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
3233
Type.UUID uuid = 32;
@@ -92,6 +93,18 @@ message ParameterizedType {
9293
Type.Nullability nullability = 4;
9394
}
9495

96+
message ParameterizedIntervalDay {
97+
IntegerOption precision = 1;
98+
uint32 variation_pointer = 2;
99+
Type.Nullability nullability = 3;
100+
}
101+
102+
message ParameterizedIntervalCompound {
103+
IntegerOption precision = 1;
104+
uint32 variation_pointer = 2;
105+
Type.Nullability nullability = 3;
106+
}
107+
95108
message ParameterizedPrecisionTimestamp {
96109
IntegerOption precision = 1;
97110
uint32 variation_pointer = 2;

proto/substrait/type.proto

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ message Type {
2727
Time time = 17;
2828
IntervalYear interval_year = 19;
2929
IntervalDay interval_day = 20;
30+
IntervalCompound interval_compound = 35;
3031
// Deprecated in favor of `PrecisionTimestampTZ precision_timestamp_tz`
3132
TimestampTZ timestamp_tz = 29 [deprecated = true];
3233
UUID uuid = 32;
@@ -122,14 +123,28 @@ message Type {
122123
Nullability nullability = 2;
123124
}
124125

126+
// An interval consisting of years and months
125127
message IntervalYear {
126128
uint32 type_variation_reference = 1;
127129
Nullability nullability = 2;
128130
}
129131

132+
// An interval consisting of days, seconds, and microseconds
130133
message IntervalDay {
131134
uint32 type_variation_reference = 1;
132135
Nullability nullability = 2;
136+
137+
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microseconds, 9 is nanoseconds, etc.
138+
// if unset, treat as 6.
139+
optional int32 precision = 3;
140+
}
141+
142+
// An interval consisting of the components of both IntervalMonth and IntervalDay
143+
message IntervalCompound {
144+
uint32 type_variation_reference = 1;
145+
Nullability nullability = 2;
146+
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microseconds, 9 is nanoseconds, etc.
147+
int32 precision = 3;
133148
}
134149

135150
message UUID {

proto/substrait/type_expressions.proto

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,12 @@ message DerivationExpression {
2626
Type.Date date = 16;
2727
Type.Time time = 17;
2828
Type.IntervalYear interval_year = 19;
29-
Type.IntervalDay interval_day = 20;
3029
// Deprecated in favor of `ExpressionPrecisionTimestampTZ precision_timestamp_tz`
3130
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
3231
Type.UUID uuid = 32;
3332

33+
ExpressionIntervalDay interval_day = 20;
34+
ExpressionIntervalCompound interval_compound = 42;
3435
ExpressionFixedChar fixed_char = 21;
3536
ExpressionVarChar varchar = 22;
3637
ExpressionFixedBinary fixed_binary = 23;
@@ -90,6 +91,18 @@ message DerivationExpression {
9091
Type.Nullability nullability = 3;
9192
}
9293

94+
message ExpressionIntervalDay {
95+
DerivationExpression precision = 1;
96+
uint32 variation_pointer = 2;
97+
Type.Nullability nullability = 3;
98+
}
99+
100+
message ExpressionIntervalCompound {
101+
DerivationExpression precision = 1;
102+
uint32 variation_pointer = 2;
103+
Type.Nullability nullability = 3;
104+
}
105+
93106
message ExpressionPrecisionTimestampTZ {
94107
DerivationExpression precision = 1;
95108
uint32 variation_pointer = 2;

site/docs/extensions/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ Rather than using a full data type representation, the input argument types (`sh
7676
| time | time |
7777
| interval_year | iyear |
7878
| interval_day | iday |
79+
| interval_compound | icompound |
7980
| uuid | uuid |
8081
| fixedchar&lt;N&gt; | fchar |
8182
| varchar&lt;N&gt; | vchar |

site/docs/types/type_classes.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ Simple type classes are those that don't support any form of configuration. For
2424
| date | A date within [1000-01-01..9999-12-31]. | `int32` days since `1970-01-01`
2525
| time | A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. | `int64` microseconds past midnight
2626
| interval_year | Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. `1y 0m` is considered equal to `0y 12m` or `1001y -12000m`. | `int32` years and `int32` months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `-10000y 200000m` is **not** allowed)
27-
| interval_day | Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. `1d 0s` is considered equal to `0d 86400s`. | `int32` days, `int32` seconds, and `int32` microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `3650001d -86400s 0us` is **not** allowed)
2827
| uuid | A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: `c48ffa9e-64f4-44cb-ae47-152b4e60e77b`. Any 128-bit value is allowed, without specific adherence to RFC4122. | 16-byte `binary`
2928

3029
## Compound Types
@@ -43,6 +42,8 @@ Compound type classes are type classes that need to be configured by means of a
4342
| MAP&lt;K, V&gt; | An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. | `repeated KeyValue` (in turn two `Literal`s), all key types matching K and all value types matching V
4443
| PRECISIONTIMESTAMP&lt;P&gt; | A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `int64` seconds, milliseconds, microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone)
4544
| PRECISIONTIMESTAMPTZ&lt;P&gt; | A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. | `int64` seconds, milliseconds, microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC
45+
| INTERVAL_DAY&lt;P&gt; | Interval day to second. Supports a range of [-3,650,000..3,650,000] days with fractional second precision (P, number of digits) 0 <= P <= 9. Usually stored as separate integers for various components, but only the total number of fractional seconds is significant, i.e. `1d 0s` is considered equal to `0d 86400s`. | `int32` days, `int32` seconds, and `int64` fractional seconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `3650001d -86400s 0us` is **not** allowed)
46+
| INTERVAL_COMPOUND&lt;P&gt; | A compound interval type that is composed of elements of the underlying elements and rules of both interval_month and interval_day to express arbitrary durations across multiple grains. Substrait gives no definition for the conversion of values between independent grains (e.g. months to days).
4647

4748
## User-Defined Types
4849

0 commit comments

Comments
 (0)