fix: (CDK) (CsvParser) - Fix the \\ escaping when passing the delimiter from Builder's UI#358
Conversation
📝 WalkthroughWalkthroughThe pull request introduces a new private method, Changes
Sequence Diagram(s)sequenceDiagram
participant C as CsvParser.parse
participant D as _get_delimiter
participant R as csv.DictReader
C->>D: Call _get_delimiter()
D-->>C: Return processed delimiter
C->>R: Initialize csv.DictReader with delimiter
R-->>C: Return parsed records
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
110-118: Consider making the delimiter processing immutable, wdyt?The method modifies
self.delimiterwhich could lead to unexpected behavior if the method is called multiple times. How about returning a new value instead?def _get_delimiter(self) -> Optional[str]: """ Get delimiter from the configuration. Check for the escape character and decode it. """ if self.delimiter is not None: if self.delimiter.startswith("\\"): - self.delimiter = self.delimiter.encode("utf-8").decode("unicode_escape") + return self.delimiter.encode("utf-8").decode("unicode_escape") + return self.delimiter - return self.delimiter + return Noneunit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
56-72: Consider adding more test cases for delimiter handling, wdyt?The current test covers the basic case well. Would you like to add tests for:
- Multiple escaped characters (e.g., "\\t")
- Other common escape sequences (e.g., "\n", "\r")
- Empty delimiter
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py(2 hunks)unit_tests/sources/declarative/decoders/test_composite_decoder.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
127-128: LGTM! Nice use of the new helper method.The parse method now correctly handles escaped delimiters through the
_get_delimiterhelper.unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
65-66: LGTM! Great test coverage for the escape handling.The test now properly verifies that the parser can handle escaped delimiters, and the comment clearly explains the intention.
Artem Inzhyyants (artem1205)
left a comment
There was a problem hiding this comment.
Approving as a temporal solution, till we fix encoding of escape characters in UI
|
Thanks for the fix Baz (@bazarnov)! |
What
Sometimes there is a case when the
csvfile is encoded with the\tor other delimiters supported (basically any character) and should be passed with theescapecharacter alongside. This breaks theCsvParserimplementation when the input goes from the Builder's UI.More context here: https://airbytehq-team.slack.com/archives/C02U9R3AF37/p1740003282758399
How
delimiterto decode theescaping_characterand normalize the input before decoding records.User Impact
No impact is expected, this is not a Breaking change.
Summary by CodeRabbit