Skip to content

feat: implement Phase 2 Snowflake UDF support (14 functions)#18

Merged
sivchari merged 6 commits intomainfrom
feat-phase2-snowflake-udfs
Feb 4, 2026
Merged

feat: implement Phase 2 Snowflake UDF support (14 functions)#18
sivchari merged 6 commits intomainfrom
feat-phase2-snowflake-udfs

Conversation

@sivchari
Copy link
Copy Markdown
Owner

@sivchari sivchari commented Feb 3, 2026

概要

Phase 2 の実装として、Snowflake 互換の UDF(User Defined Functions)を 14 個実装しました。

実装した関数

条件式関数 (3)

  • IFF(condition, true_value, false_value) - インライン IF 式
  • NVL(expr1, expr2) - expr1 が NULL の場合 expr2 を返す
  • NVL2(expr1, expr2, expr3) - expr1 が NOT NULL なら expr2、NULL なら expr3

JSON 関数 (2)

  • PARSE_JSON(string) - 文字列を JSON としてパース
  • TO_JSON(variant) - 値を JSON 文字列に変換

日付/時刻関数 (2)

  • DATEADD(part, value, date) - 日付に間隔を加算
  • DATEDIFF(part, date1, date2) - 日付間の差を計算

TRY_* 関数群 (4)

  • TRY_PARSE_JSON(string) - JSON パース(エラー時 NULL)
  • TRY_TO_NUMBER(string) - 数値変換(エラー時 NULL)
  • TRY_TO_DATE(string) - 日付変換(エラー時 NULL)
  • TRY_TO_BOOLEAN(string) - 真偽値変換(エラー時 NULL)

配列/オブジェクト関数 (4)

  • FLATTEN_ARRAY(array, index) - 配列から要素を取得
  • ARRAY_SIZE(array) - 配列のサイズを取得
  • GET_PATH(json, path) - JSON パスで値を取得
  • OBJECT_KEYS(object) - オブジェクトのキー一覧

技術詳細

アーキテクチャ

  • DataFusion の ScalarUDF インターフェースを使用
  • Executor 初期化時に全 UDF を登録
  • Arrow の列指向データ処理を活用

ファイル構成

engine/src/functions/
├── mod.rs               # モジュール組織とエクスポート
├── conditional.rs       # IFF, NVL, NVL2
├── json.rs              # PARSE_JSON, TO_JSON
├── datetime.rs          # DATEADD, DATEDIFF
├── try_functions.rs     # TRY_* 関数群
└── flatten.rs           # 配列/オブジェクト関数

テスト

Rust ユニットテスト

cargo test --package engine

結果: 41 テストすべてパス ✅

Go 統合テスト

cd go && go test

追加: 17 件の統合テスト(全 14 UDF をカバー)

変更ファイル

  • engine/Cargo.toml: chrono 依存関係を追加
  • engine/src/lib.rs: functions モジュールを公開
  • engine/src/executor.rs: 14 個の UDF を登録
  • engine/src/functions/*.rs: 各カテゴリの UDF 実装(計 5 ファイル)
  • go/integration_test.go: 統合テスト追加(330 行追加)

統計: 10 ファイル変更、3205 行追加

実行例

-- 条件式
SELECT IFF(amount > 100, 'high', 'low') FROM sales;

-- JSON 処理
SELECT PARSE_JSON('{"user": {"name": "Alice"}}');
SELECT GET_PATH(data, 'user.name') FROM json_table;

-- 日付計算
SELECT DATEADD('day', 7, '2024-01-01');
SELECT DATEDIFF('month', start_date, end_date) FROM events;

-- エラーセーフ変換
SELECT TRY_TO_NUMBER(price_str) FROM products;

チェックリスト

  • 全 UDF の実装完了
  • Rust ユニットテスト追加(41 件)
  • Go 統合テスト追加(17 件)
  • コード品質確認(warnings のみ、errors なし)
  • ドキュメント追加(mod.rs にドキュメントコメント)
  • CI/CD パス確認
  • レビュー準備完了

関連

Phase 2 実装計画に基づく実装です。
次のステップ: Phase 3 (半構造化データ、LATERAL FLATTEN)

Implement 14 Snowflake-compatible user-defined functions (UDFs)
as part of Phase 2 of the emulator implementation plan.

## Implemented Functions

### Conditional Functions
- IFF(condition, true_value, false_value) - Inline IF expression
- NVL(expr1, expr2) - Return expr2 if expr1 is NULL
- NVL2(expr1, expr2, expr3) - Return expr2 if expr1 is NOT NULL, else expr3

### JSON Functions
- PARSE_JSON(string) - Parse string as JSON
- TO_JSON(variant) - Convert value to JSON string

### Date/Time Functions
- DATEADD(part, value, date) - Add interval to date/time
- DATEDIFF(part, date1, date2) - Calculate difference between dates

### TRY_* Functions (error-safe variants)
- TRY_PARSE_JSON(string) - Parse JSON, return NULL on error
- TRY_TO_NUMBER(string) - Convert to number, return NULL on error
- TRY_TO_DATE(string) - Convert to date, return NULL on error
- TRY_TO_BOOLEAN(string) - Convert to boolean, return NULL on error

### Array/Object Functions (FLATTEN helpers)
- FLATTEN_ARRAY(array, index) - Get element at index from JSON array
- ARRAY_SIZE(array) - Get size of JSON array
- GET_PATH(json, path) - Extract value using dot notation path
- OBJECT_KEYS(object) - Get all keys from JSON object

## Implementation Details

- All functions implemented using DataFusion's ScalarUDF interface
- Functions registered in Executor during initialization
- Comprehensive unit tests for each function (41 Rust tests)
- Integration tests added for gosnowflake driver (17 Go tests)

## Testing

Rust tests:
```
cargo test --package engine
# 41 passed; 0 failed
```

Go integration tests (requires running server):
```
cd go && go test
# Tests for all 14 UDFs
```

## Files Changed

- engine/Cargo.toml: Add chrono dependency
- engine/src/lib.rs: Export functions module
- engine/src/executor.rs: Register all 14 UDFs
- engine/src/functions/mod.rs: Module organization and exports
- engine/src/functions/conditional.rs: IFF, NVL, NVL2
- engine/src/functions/json.rs: PARSE_JSON, TO_JSON
- engine/src/functions/datetime.rs: DATEADD, DATEDIFF
- engine/src/functions/try_functions.rs: TRY_* functions
- engine/src/functions/flatten.rs: Array/Object helper functions
- go/integration_test.go: Add 17 integration tests

Relates to Phase 2 implementation plan.
Add shared utility functions to reduce code duplication and improve
safety across UDF implementations:

- nanos_to_components(): Safe nanosecond conversion using Euclidean division
- clamp_day_to_month()/last_day_of_month(): Accurate end-of-month handling
- safe_index()/safe_index_i32(): Safe index conversion with negative check
- process_string_input(): Common pattern for string array processing
- truncate_for_error(): Error message truncation utility

These helpers address security and correctness issues identified in PR review.
Address critical issues identified in PR review:

High severity:
- Fix nanosecond overflow when converting i64 to u32 by using Euclidean
  division/remainder (nanos_to_components helper)
- Add explicit error handling for timestamp_nanos_opt() instead of
  silent fallback to 0

Medium severity:
- Replace .min(28) with clamp_day_to_month() for accurate month-end
  handling in DATEADD/DATEDIFF (handles leap years correctly)

Affected functions:
- dateadd_scalar(): TimestampNanosecond handling (line 135-145)
- scalar_to_datetime(): TimestampNanosecond conversion (line 426-431)
- add_to_date(): Month/Quarter arithmetic (lines 237, 248)
- add_to_datetime(): Month/Quarter arithmetic (lines 271, 292)
Address medium severity issue identified in PR review:

- Replace unsafe casting (*i as usize) with safe_index() helper
- Return NULL for negative indices instead of causing undefined behavior
- Add explicit validation in FLATTEN_ARRAY function (lines 76-89)

This prevents potential out-of-bounds access when negative indices
are passed to array operations.
Remove unused Int64Array import from test module to clean up
compiler warnings.
- Include engine/** in the CI trigger paths to run tests when
  UDF implementations and core engine code are modified
- Add workflow_dispatch for manual triggering
@sivchari sivchari force-pushed the feat-phase2-snowflake-udfs branch from 4a3b41f to 7e54c92 Compare February 3, 2026 12:29
@sivchari sivchari marked this pull request as ready for review February 4, 2026 01:26
@sivchari sivchari merged commit 083a290 into main Feb 4, 2026
1 check passed
@sivchari sivchari deleted the feat-phase2-snowflake-udfs branch February 4, 2026 01:26
@github-actions github-actions bot mentioned this pull request Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant