Skip to content

feat: Phase 5 - Extended Features (String/Window/QUALIFY/DDL/GET)#21

Merged
sivchari merged 7 commits intomainfrom
feature/phase5-extended-features
Feb 5, 2026
Merged

feat: Phase 5 - Extended Features (String/Window/QUALIFY/DDL/GET)#21
sivchari merged 7 commits intomainfrom
feature/phase5-extended-features

Conversation

@sivchari
Copy link
Copy Markdown
Owner

@sivchari sivchari commented Feb 5, 2026

Summary

Phase 5 implementation adding extended features for improved Snowflake compatibility:

Step 1: Additional String Functions

  • CHARINDEX(substr, str, [start_pos]) - Find position of substring (1-based)
  • POSITION(substr IN str) - Alias for CHARINDEX (SQL rewriter)
  • REVERSE(str) - Reverse string
  • LPAD(str, len, [pad]) - Left pad string to specified length
  • RPAD(str, len, [pad]) - Right pad string to specified length
  • TRANSLATE(str, source, target) - Character-by-character translation

Step 2: Additional Window Functions

Verified DataFusion native support for:

  • FIRST_VALUE(col) - First value in window partition
  • LAST_VALUE(col) - Last value in window partition
  • NTH_VALUE(col, n) - Nth value in window partition
  • NTILE(n) - Divide rows into n buckets

Step 3: QUALIFY Clause

SQL rewriter converts QUALIFY to CTE + WHERE:

-- Input
SELECT id, ROW_NUMBER() OVER (...) as rn FROM t QUALIFY rn = 1
-- Output
WITH _qualify AS (...) SELECT * FROM _qualify WHERE rn = 1

Step 4: DDL Extensions

Verified DataFusion native support for:

  • DROP TABLE - Remove table from session
  • CREATE VIEW - Create view with query definition
  • DROP VIEW - Remove view from session

Note: ALTER TABLE ADD/DROP COLUMN is not supported by DataFusion.

Step 5: GET Function Extensions

  • GET(variant, index_or_key) - Extract element from JSON array/object
  • Bracket notation: col['key']get(col, 'key'), col[0]get(col, 0)

Step 6: Integration Tests

Added comprehensive Go integration tests for all Phase 5 features.

Test Plan

  • All 143 engine unit tests pass
  • Integration test cases added for all new features
  • CI workflow passes

Phase 5 Step 1: Additional string functions for Snowflake compatibility

- CHARINDEX(substr, str, [start_pos]) - returns 1-based position of substring
- REVERSE(str) - reverses string
- LPAD(str, len, [pad]) - left pads string to specified length
- RPAD(str, len, [pad]) - right pads string to specified length
- TRANSLATE(str, source, target) - character-by-character translation
- POSITION(substr IN str) - SQL rewriter converts to CHARINDEX

All functions include comprehensive unit tests.
Phase 5 Step 2: Verify additional Window functions

These functions are natively supported by DataFusion:
- FIRST_VALUE(col) - returns first value in window partition
- LAST_VALUE(col) - returns last value in window partition
- NTH_VALUE(col, n) - returns nth value in window partition
- NTILE(n) - divides rows into n buckets

Added comprehensive tests to verify correct behavior.
Phase 5 Step 3: Add QUALIFY clause for window function filtering

QUALIFY is converted to CTE + WHERE via SQL rewriter:
- Input:  SELECT id, ROW_NUMBER() OVER (...) as rn FROM t QUALIFY rn = 1
- Output: WITH _qualify AS (...) SELECT * FROM _qualify WHERE rn = 1

Supports:
- Basic QUALIFY condition
- QUALIFY with ORDER BY clause
- QUALIFY with LIMIT clause (trailing clauses preserved)

Added tests for:
- SQL rewriter transformation
- End-to-end executor test with partitioned ROW_NUMBER
Phase 5 Step 4: Verify DDL extension support via DataFusion native

These DDL operations are natively supported by DataFusion:
- DROP TABLE - removes table from session
- CREATE VIEW - creates view with query definition
- DROP VIEW - removes view from session

Note: ALTER TABLE ADD/DROP COLUMN is not supported by DataFusion
and would require custom implementation for schema modification.

Added comprehensive tests to verify correct behavior.
Phase 5 Step 5: GET function extensions for JSON element access

New features:
- GET(variant, index_or_key) - extract element from JSON array/object
  - GET('[1,2,3]', 0) -> '1'
  - GET('{"a":1}', 'a') -> '1'
  - Supports both integer indices and string keys

SQL rewriter for bracket notation:
- col['key'] -> get(col, 'key')
- col[0] -> get(col, 0)

Added comprehensive unit tests for:
- Array index access
- Object key access
- String index (numeric string as index)
- SQL rewriter bracket notation
Phase 5 Step 6: Integration tests for new features

Added Go integration tests for:

String functions:
- CHARINDEX - find substring position
- POSITION - alias for CHARINDEX
- REVERSE - reverse string
- LPAD - left pad
- RPAD - right pad
- TRANSLATE - character translation

Window functions:
- FIRST_VALUE - verified via partition query
- NTILE - bucket distribution test

QUALIFY clause:
- Tested with ROW_NUMBER partitioned query

GET function:
- Array index access
- Object key access

DDL operations:
- DROP TABLE
- CREATE/DROP VIEW
@sivchari sivchari merged commit 749ddc7 into main Feb 5, 2026
1 check passed
@sivchari sivchari deleted the feature/phase5-extended-features branch February 5, 2026 05:55
@github-actions github-actions bot mentioned this pull request Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant