feat: Phase 5 - Extended Features (String/Window/QUALIFY/DDL/GET)#21
Merged
feat: Phase 5 - Extended Features (String/Window/QUALIFY/DDL/GET)#21
Conversation
Phase 5 Step 1: Additional string functions for Snowflake compatibility - CHARINDEX(substr, str, [start_pos]) - returns 1-based position of substring - REVERSE(str) - reverses string - LPAD(str, len, [pad]) - left pads string to specified length - RPAD(str, len, [pad]) - right pads string to specified length - TRANSLATE(str, source, target) - character-by-character translation - POSITION(substr IN str) - SQL rewriter converts to CHARINDEX All functions include comprehensive unit tests.
Phase 5 Step 2: Verify additional Window functions These functions are natively supported by DataFusion: - FIRST_VALUE(col) - returns first value in window partition - LAST_VALUE(col) - returns last value in window partition - NTH_VALUE(col, n) - returns nth value in window partition - NTILE(n) - divides rows into n buckets Added comprehensive tests to verify correct behavior.
Phase 5 Step 3: Add QUALIFY clause for window function filtering QUALIFY is converted to CTE + WHERE via SQL rewriter: - Input: SELECT id, ROW_NUMBER() OVER (...) as rn FROM t QUALIFY rn = 1 - Output: WITH _qualify AS (...) SELECT * FROM _qualify WHERE rn = 1 Supports: - Basic QUALIFY condition - QUALIFY with ORDER BY clause - QUALIFY with LIMIT clause (trailing clauses preserved) Added tests for: - SQL rewriter transformation - End-to-end executor test with partitioned ROW_NUMBER
Phase 5 Step 4: Verify DDL extension support via DataFusion native These DDL operations are natively supported by DataFusion: - DROP TABLE - removes table from session - CREATE VIEW - creates view with query definition - DROP VIEW - removes view from session Note: ALTER TABLE ADD/DROP COLUMN is not supported by DataFusion and would require custom implementation for schema modification. Added comprehensive tests to verify correct behavior.
Phase 5 Step 5: GET function extensions for JSON element access
New features:
- GET(variant, index_or_key) - extract element from JSON array/object
- GET('[1,2,3]', 0) -> '1'
- GET('{"a":1}', 'a') -> '1'
- Supports both integer indices and string keys
SQL rewriter for bracket notation:
- col['key'] -> get(col, 'key')
- col[0] -> get(col, 0)
Added comprehensive unit tests for:
- Array index access
- Object key access
- String index (numeric string as index)
- SQL rewriter bracket notation
Phase 5 Step 6: Integration tests for new features Added Go integration tests for: String functions: - CHARINDEX - find substring position - POSITION - alias for CHARINDEX - REVERSE - reverse string - LPAD - left pad - RPAD - right pad - TRANSLATE - character translation Window functions: - FIRST_VALUE - verified via partition query - NTILE - bucket distribution test QUALIFY clause: - Tested with ROW_NUMBER partitioned query GET function: - Array index access - Object key access DDL operations: - DROP TABLE - CREATE/DROP VIEW
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 5 implementation adding extended features for improved Snowflake compatibility:
Step 1: Additional String Functions
CHARINDEX(substr, str, [start_pos])- Find position of substring (1-based)POSITION(substr IN str)- Alias for CHARINDEX (SQL rewriter)REVERSE(str)- Reverse stringLPAD(str, len, [pad])- Left pad string to specified lengthRPAD(str, len, [pad])- Right pad string to specified lengthTRANSLATE(str, source, target)- Character-by-character translationStep 2: Additional Window Functions
Verified DataFusion native support for:
FIRST_VALUE(col)- First value in window partitionLAST_VALUE(col)- Last value in window partitionNTH_VALUE(col, n)- Nth value in window partitionNTILE(n)- Divide rows into n bucketsStep 3: QUALIFY Clause
SQL rewriter converts QUALIFY to CTE + WHERE:
Step 4: DDL Extensions
Verified DataFusion native support for:
DROP TABLE- Remove table from sessionCREATE VIEW- Create view with query definitionDROP VIEW- Remove view from sessionNote:
ALTER TABLE ADD/DROP COLUMNis not supported by DataFusion.Step 5: GET Function Extensions
GET(variant, index_or_key)- Extract element from JSON array/objectcol['key']→get(col, 'key'),col[0]→get(col, 0)Step 6: Integration Tests
Added comprehensive Go integration tests for all Phase 5 features.
Test Plan