feat: implement Phase 2 Snowflake UDF support (14 functions)#18
Merged
feat: implement Phase 2 Snowflake UDF support (14 functions)#18
Conversation
Implement 14 Snowflake-compatible user-defined functions (UDFs) as part of Phase 2 of the emulator implementation plan. ## Implemented Functions ### Conditional Functions - IFF(condition, true_value, false_value) - Inline IF expression - NVL(expr1, expr2) - Return expr2 if expr1 is NULL - NVL2(expr1, expr2, expr3) - Return expr2 if expr1 is NOT NULL, else expr3 ### JSON Functions - PARSE_JSON(string) - Parse string as JSON - TO_JSON(variant) - Convert value to JSON string ### Date/Time Functions - DATEADD(part, value, date) - Add interval to date/time - DATEDIFF(part, date1, date2) - Calculate difference between dates ### TRY_* Functions (error-safe variants) - TRY_PARSE_JSON(string) - Parse JSON, return NULL on error - TRY_TO_NUMBER(string) - Convert to number, return NULL on error - TRY_TO_DATE(string) - Convert to date, return NULL on error - TRY_TO_BOOLEAN(string) - Convert to boolean, return NULL on error ### Array/Object Functions (FLATTEN helpers) - FLATTEN_ARRAY(array, index) - Get element at index from JSON array - ARRAY_SIZE(array) - Get size of JSON array - GET_PATH(json, path) - Extract value using dot notation path - OBJECT_KEYS(object) - Get all keys from JSON object ## Implementation Details - All functions implemented using DataFusion's ScalarUDF interface - Functions registered in Executor during initialization - Comprehensive unit tests for each function (41 Rust tests) - Integration tests added for gosnowflake driver (17 Go tests) ## Testing Rust tests: ``` cargo test --package engine # 41 passed; 0 failed ``` Go integration tests (requires running server): ``` cd go && go test # Tests for all 14 UDFs ``` ## Files Changed - engine/Cargo.toml: Add chrono dependency - engine/src/lib.rs: Export functions module - engine/src/executor.rs: Register all 14 UDFs - engine/src/functions/mod.rs: Module organization and exports - engine/src/functions/conditional.rs: IFF, NVL, NVL2 - engine/src/functions/json.rs: PARSE_JSON, TO_JSON - engine/src/functions/datetime.rs: DATEADD, DATEDIFF - engine/src/functions/try_functions.rs: TRY_* functions - engine/src/functions/flatten.rs: Array/Object helper functions - go/integration_test.go: Add 17 integration tests Relates to Phase 2 implementation plan.
Add shared utility functions to reduce code duplication and improve safety across UDF implementations: - nanos_to_components(): Safe nanosecond conversion using Euclidean division - clamp_day_to_month()/last_day_of_month(): Accurate end-of-month handling - safe_index()/safe_index_i32(): Safe index conversion with negative check - process_string_input(): Common pattern for string array processing - truncate_for_error(): Error message truncation utility These helpers address security and correctness issues identified in PR review.
Address critical issues identified in PR review: High severity: - Fix nanosecond overflow when converting i64 to u32 by using Euclidean division/remainder (nanos_to_components helper) - Add explicit error handling for timestamp_nanos_opt() instead of silent fallback to 0 Medium severity: - Replace .min(28) with clamp_day_to_month() for accurate month-end handling in DATEADD/DATEDIFF (handles leap years correctly) Affected functions: - dateadd_scalar(): TimestampNanosecond handling (line 135-145) - scalar_to_datetime(): TimestampNanosecond conversion (line 426-431) - add_to_date(): Month/Quarter arithmetic (lines 237, 248) - add_to_datetime(): Month/Quarter arithmetic (lines 271, 292)
Address medium severity issue identified in PR review: - Replace unsafe casting (*i as usize) with safe_index() helper - Return NULL for negative indices instead of causing undefined behavior - Add explicit validation in FLATTEN_ARRAY function (lines 76-89) This prevents potential out-of-bounds access when negative indices are passed to array operations.
Remove unused Int64Array import from test module to clean up compiler warnings.
- Include engine/** in the CI trigger paths to run tests when UDF implementations and core engine code are modified - Add workflow_dispatch for manual triggering
4a3b41f to
7e54c92
Compare
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概要
Phase 2 の実装として、Snowflake 互換の UDF(User Defined Functions)を 14 個実装しました。
実装した関数
条件式関数 (3)
IFF(condition, true_value, false_value)- インライン IF 式NVL(expr1, expr2)- expr1 が NULL の場合 expr2 を返すNVL2(expr1, expr2, expr3)- expr1 が NOT NULL なら expr2、NULL なら expr3JSON 関数 (2)
PARSE_JSON(string)- 文字列を JSON としてパースTO_JSON(variant)- 値を JSON 文字列に変換日付/時刻関数 (2)
DATEADD(part, value, date)- 日付に間隔を加算DATEDIFF(part, date1, date2)- 日付間の差を計算TRY_* 関数群 (4)
TRY_PARSE_JSON(string)- JSON パース(エラー時 NULL)TRY_TO_NUMBER(string)- 数値変換(エラー時 NULL)TRY_TO_DATE(string)- 日付変換(エラー時 NULL)TRY_TO_BOOLEAN(string)- 真偽値変換(エラー時 NULL)配列/オブジェクト関数 (4)
FLATTEN_ARRAY(array, index)- 配列から要素を取得ARRAY_SIZE(array)- 配列のサイズを取得GET_PATH(json, path)- JSON パスで値を取得OBJECT_KEYS(object)- オブジェクトのキー一覧技術詳細
アーキテクチャ
ScalarUDFインターフェースを使用Executor初期化時に全 UDF を登録ファイル構成
テスト
Rust ユニットテスト
cargo test --package engine結果: 41 テストすべてパス ✅
Go 統合テスト
追加: 17 件の統合テスト(全 14 UDF をカバー)
変更ファイル
engine/Cargo.toml: chrono 依存関係を追加engine/src/lib.rs: functions モジュールを公開engine/src/executor.rs: 14 個の UDF を登録engine/src/functions/*.rs: 各カテゴリの UDF 実装(計 5 ファイル)go/integration_test.go: 統合テスト追加(330 行追加)統計: 10 ファイル変更、3205 行追加
実行例
チェックリスト
関連
Phase 2 実装計画に基づく実装です。
次のステップ: Phase 3 (半構造化データ、LATERAL FLATTEN)