🐛 fix decode dot in keys#25
Conversation
…ndle degenerate cases
… decoder results in DecodeOptions
…to_bracket_top_level
…inated brackets in top-level dot splitting
…oded dot handling in decode logic
|
Warning Rate limit exceeded@techouse has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 32 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughPreprocesses query-string tokens, normalizes percent-encoded brackets/dots, introduces kind-aware decoding via DecodeOptions (with legacy_decoder compatibility), replaces regex dot-splitting with a top-level dot scanner, refines comma-split and negative Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Raw query string
participant Decode as src/qs_codec/decode.py
participant Utils as DecodeUtils
participant Opts as DecodeOptions
participant Parser as _parse_object/_parse_array_value
Client->>Decode: raw "k=v" tokens
Decode->>Utils: normalize %5B/%5D (case-insensitive)
Decode->>Utils: dot_to_bracket_top_level(key_token) / split_key_into_segments
Utils-->>Decode: key segments (top-level dots -> brackets)
Decode->>Opts: decode(key_token, kind=KEY)
Opts->>Opts: select/adapt decoder (decoder > legacy_decoder > default)
Opts-->>Decode: decoded key (or None)
Decode->>Opts: decode(value_token, kind=VALUE)
Opts-->>Decode: decoded value (or None)
alt token decode returned None
Decode-->>Client: skip pair
else
Decode->>Parser: apply bracket vs numeric-index heuristics, list_limit & comma-split rules
Parser-->>Decode: intermediate entries (flat dict)
Decode-->>Client: accumulate results
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25 +/- ##
==========================================
- Coverage 94.01% 93.82% -0.20%
==========================================
Files 16 16
Lines 1070 1134 +64
==========================================
+ Hits 1006 1064 +58
- Misses 64 70 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…acy_decoder signatures
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/decode_options_test.py (1)
232-244: Coercion test: replace PEP 604 unionsApply this diff:
- def dec( - v: str | None, - charset: Charset | None = None, + def dec( + v: t.Optional[str], + charset: t.Optional[Charset] = None, *, kind: DecodeKind = DecodeKind.VALUE, - ) -> object | None: + ) -> t.Optional[object]: return 42 if v is not None else None
🧹 Nitpick comments (6)
src/qs_codec/utils/decode_utils.py (2)
7-8: Doc: great clarification of top‑level dot rules; consider cross‑file consistency noteThe new docstring clearly enumerates edge cases (leading/trailing/double dots, skip before “[”) and the “don’t split on %2E” rule. One follow‑up:
DecodeKind’s docstring (enums/decode_kind.py) still mentions preserving encoded dots for KEY decoding, which no longer reflects the unified key/value decoding here. Recommend aligning that doc to avoid confusion.
32-108: Top‑level dot scanner logic looks solid and O(n); minor comment cleanup suggestedThe depth tracking and handling of “.[”, leading “.”, double dots, and trailing “.” are correct and match the stated rules. Small nit: the “normal split (also handles leading '.')” comment duplicates the explicit leading‑dot branch above and may mislead future readers.
Proposed comment tweak only:
- # normal split (also handles leading '.'): convert a.b → a[b] - # and '.a' → '[a]' at top level + # normal split at top level: convert a.b → a[b]src/qs_codec/models/decode_options.py (1)
202-215: Kind passing heuristic: robust; add Literal/typing annotation caveat (nit)The
Enumvsstrdecision is thoughtful. Minor future‑proofing: if someone annotateskindastyping.Literal["key","value"], it will be treated as “pass string” (fine). Consider a short inline comment to document this for maintainers. No code change required.src/qs_codec/decode.py (3)
126-145: list_limit wording vs implementation: behavior is consistent; nit on doc phrasingThe doc explains negative
list_limitsemantics. The implementation enforces immediate errors for comma‑splits whenraise_on_limit_exceeded=True(since any split length > negative limit), and otherwise proceeds. That matches the note. Consider clarifying that index‑based growth is handled in_parse_objectwhile comma‑splits are handled here.
186-188: Case‑insensitive bracket normalization: OK, but note that values are affected tooReplacing
%5B/%5Dglobally is pragmatic and keeps splitting simple. It also affects value tokens, which is acceptable given values are then decoded through the decoder. Add a brief comment to acknowledge this intentional trade‑off.
324-336: Current list length pre‑compute for trailing []: logic is conservative; consider simplificationThe attempt to infer
current_list_lengthby parsingchain[:-1]into an int usually yieldsNone(e.g., "a[0]" → ValueError), so the branch rarely triggers. Since_parse_array_valuealready enforces list limits at the leaf and you pass explicit lengths for existing containers in_parse_query_string_values, this pre‑compute could be simplified or documented as best‑effort.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
src/qs_codec/decode.py(4 hunks)src/qs_codec/models/decode_options.py(4 hunks)src/qs_codec/utils/decode_utils.py(7 hunks)tests/unit/decode_options_test.py(1 hunks)tests/unit/decode_test.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/unit/decode_options_test.py (3)
src/qs_codec/models/decode_options.py (3)
DecodeOptions(20-286)decode_key(279-282)decode_value(284-286)src/qs_codec/enums/charset.py (1)
Charset(25-36)src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
tests/unit/decode_test.py (5)
src/qs_codec/models/decode_options.py (2)
DecodeOptions(20-286)decode(263-277)src/qs_codec/decode.py (1)
decode(31-101)src/qs_codec/utils/decode_utils.py (1)
decode(143-176)src/qs_codec/enums/charset.py (1)
Charset(25-36)src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
src/qs_codec/models/decode_options.py (3)
src/qs_codec/utils/decode_utils.py (1)
decode(143-176)src/qs_codec/enums/charset.py (1)
Charset(25-36)src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
src/qs_codec/utils/decode_utils.py (1)
src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
🪛 GitHub Actions: Test
tests/unit/decode_options_test.py
[error] 186-186: pytest -v --cov=src/qs_codec --cov-report=xml failed with exit code 1. Test 'test_decoder_is_used_for_key_and_value' failed due to TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in annotation 'str | None'. Python 3.9 does not support PEP 604 union types; upgrade to Python 3.10+ or replace with Optional[str].
[error] 199-199: pytest -v --cov=src/qs_codec --cov-report=xml failed with exit code 1. Test 'test_decoder_null_return_is_honored' failed due to TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in annotation 'str | None'. Python 3.9 does not support PEP 604 union types; upgrade to Python 3.10+ or replace with Optional[str].
[error] 207-207: pytest -v --cov=src/qs_codec --cov-report=xml failed with exit code 1. Test 'test_single_decoder_acts_like_legacy_when_ignoring_kind' failed due to TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in annotation 'str | None'. Python 3.9 does not support PEP 604 union types; upgrade to Python 3.10+ or replace with Optional[str].
[error] 217-217: pytest -v --cov=src/qs_codec --cov-report=xml failed with exit code 1. Test 'test_decoder_wins_over_legacy_decoder_when_both_provided' failed due to TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in annotation 'str | None'. Python 3.9 does not support PEP 604 union types; upgrade to Python 3.10+ or replace with Optional[str].
[error] 235-235: pytest -v --cov=src/qs_codec --cov-report=xml failed with exit code 1. Test 'test_decode_key_coerces_non_string_decoder_result' failed due to TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in annotation 'str | None'. Python 3.9 does not support PEP 604 union types; upgrade to Python 3.10+ or replace with Optional[str].
🪛 Ruff (0.12.2)
tests/unit/decode_test.py
1556-1556: Dictionary key literal "a" repeated
(F601)
🪛 Flake8 (7.2.0)
tests/unit/decode_test.py
[error] 1556-1556: dictionary key 'a' repeated with different values
(F601)
[error] 1556-1556: dictionary key 'a' repeated with different values
(F601)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (12)
src/qs_codec/utils/decode_utils.py (3)
147-156: Unified decoding path: OK; confirm intentional no‑opkindUsing the same decoding logic for keys and values is consistent with the new tests. The
kindparameter is intentionally unused here (pylint‑disabled). Just confirming this is by design now that dot handling moved entirely to the splitter/object builder.
188-201: Behavioral change in split: depth<=0 returns original key — matches testsReturning
[original_key]whenmax_depth <= 0preserves keys unsplit (parity with tests like “uses original key when depth is 0”). Good defensive default.
240-246: Remainder wrapping: behavior matches expected “[c][d]” literal segmentWrapping the remainder as
[[...]]to yield a single literal key like"[c][d]"matches test expectations (e.g., depth limit with no strict depth). Looks good.src/qs_codec/models/decode_options.py (5)
31-35: Option semantics are clear: good note on top‑level vs in‑bracket behaviorThe doc clarifies that
decode_dot_in_keysaffects top‑level splitting semantics and not percent‑decoding inside brackets. This aligns with the new unified decoding pipeline.
138-142: Newlegacy_decoder: good back‑compat surface and precedence noteThe explicit field and doc about precedence (decoder > legacy_decoder > default) will save integrators time.
155-162: Decoder precedence and fallback: sensible defaultsThe precedence and defaulting to
DecodeUtils.decodeavoidNonepaths later. Good.
250-261: Adapter wrapper: nice consolidation pointWrapping user decoders into a uniform
(s, charset, *, kind)signature is clean and keeps call sites simple.
262-287: New convenience methods: API ergonomics improved
decode,decode_key,decode_valueprovide a clear, single entrypoint. Coercingdecode_keytostrsafeguards callers from custom decoders returning non‑strings.tests/unit/decode_test.py (1)
1530-1533: Guardrail expectations match splitter behavior — LGTMTests around trailing dots, leading dots, and double dots align with
dot_to_bracket_top_levelrules (preserve trailing, skip “.[”, convert “a..b” → “a.[b]”).Also applies to: 1619-1625
src/qs_codec/decode.py (2)
238-270: Decoder usage: correct KEY/VALUE signaling and None‑skippingThe key/value decode with
kindand the “skip pair when key decodes to None” behavior is clean and matches tests.
352-365: In‑bracket %2E normalization underdecode_dot_in_keys: good parity with custom decodersNormalizing
%2E/%2einside bracket segments here covers the case where a custom decoder returns raw tokens. Clear comments — nice.tests/unit/decode_options_test.py (1)
215-231: Ensure decoder wins over legacy_decoder: replace PEP 604 unionsApply this diff:
- def legacy(v: str | None, charset: Charset | None = None) -> str | None: + def legacy(v: t.Optional[str], charset: t.Optional[Charset] = None) -> t.Optional[str]: return f"L:{'null' if v is None else v}" - def dec( - v: str | None, - charset: Charset | None = None, + def dec( + v: t.Optional[str], + charset: t.Optional[Charset] = None, *, kind: DecodeKind = DecodeKind.VALUE, - ) -> str | None: + ) -> t.Optional[str]: return f"K:{kind.name}:{'null' if v is None else v}"Likely an incorrect or invalid review comment.
… decoder invocation for dot-encoded and bracketed keys
…ind parameter in decode logic
…t in decode function
… query string prior to splitting
…cement in decode logic
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/qs_codec/utils/decode_utils.py (1)
213-244: Strict-depth raising also triggers on unterminated bracket groups; differentiate overflow vs. untermination
Today, any leftover '[' after the main loop triggers an IndexError when strict_depth=True, even if the loop exited due to an unterminated group (close < 0) rather than exceeding max_depth. The docstring promises a synthetic segment for unterminated groups (qs/Kotlin parity). Tighten the condition so we only raise on “more groups beyond max_depth” and not on unterminated input.Apply this diff to track untermination and only raise on true depth overflow:
@@ - while open_idx >= 0 and depth < max_depth: + unterminated = False + while open_idx >= 0 and depth < max_depth: @@ - if close < 0: - break # unterminated group; stop collecting; remainder handled below + if close < 0: + unterminated = True # unterminated group; stop collecting; remainder handled below + break @@ - if open_idx >= 0: - if strict_depth: + if open_idx >= 0: + # We only want to raise for true depth overflow under strict_depth, + # not for unterminated bracket groups. + depth_overflow = (depth >= max_depth) and not unterminated + if strict_depth and depth_overflow: raise IndexError(f"Input depth exceeded depth option of {max_depth} and strict_depth is True") # Stash the remainder as a single segment (qs/Kotlin parity) segments.append("[" + key[open_idx:] + "]")
🧹 Nitpick comments (5)
src/qs_codec/utils/decode_utils.py (1)
51-57: Edge cases: leading '.' without following token and empty bracket name
- For the lone key "." (no following token), the code treats it as a trailing dot and preserves ".". That matches “trailing '.' is preserved,” but differs from “leading '.' starts a bracket segment.” If this is intentional, consider clarifying the docstring that ".a → [a]" but "." remains ".".
- For inputs like ".[ ]" (space inside brackets after a skipped dot), behavior is consistent, but a short test would lock this down.
Would you like me to add targeted unit tests for "." and for whitespace after a skipped dot (e.g., "a.[ b ]=x")?
Also applies to: 68-99, 103-107
tests/unit/decode_test.py (3)
1485-1500: Mixed-case encoded brackets + encoded dots: ensure option consistency path is covered
Good coverage of mixed-case %5B/%5D and %2E. One nit: the “inconsistent options raises” case ties the error to options construction, not parsing. Consider asserting the ValueError at options init time to decouple from decode() and make the intent explicit.- with pytest.raises(ValueError): - decode("a%5Bb%5D%5Bc%5D%2Ed=x", DecodeOptions(allow_dots=False, decode_dot_in_keys=True)) + with pytest.raises(ValueError): + DecodeOptions(allow_dots=False, decode_dot_in_keys=True)
1513-1521: Case-insensitive %2E handling assertion is good; add control with allow_dots=False
You demonstrate both uppercase and lowercase %2E variants. Add a control asserting that with allow_dots=False, both remain literal and do not split to increase confidence.opt = DecodeOptions(allow_dots=True, decode_dot_in_keys=True) assert decode("a[b]%2Ec=x", opt) == {"a": {"b": {"c": "x"}}} assert decode("a[b]%2ec=x", opt) == {"a": {"b": {"c": "x"}}} + # Control: no splitting when allow_dots=False + no_dots = DecodeOptions(allow_dots=False, decode_dot_in_keys=False) + assert decode("a[b]%2Ec=x", no_dots) == {"a[b].c": "x"}
1564-1622: Additional parity tests mix two different models for encoded-dot behavior
Parts of this block assume “encoded dots decode to literal '.' and then split under allow_dots” (model A), others assume “encoded dots never split unless decode_dot_in_keys=True” (model B). Please unify to one model and update expectations accordingly (see earlier comment).I can submit a cohesive test matrix once you confirm whether keys are pre-decoded before splitting or not.
src/qs_codec/models/decode_options.py (1)
283-291: decode_key coerces any scalar to str; callers relying on non-str keys should be aware
Minor note: decode_key forces str() on any non-None result. If a custom decoder returns non-strings for keys (e.g., bytes), they’ll be coerced. This is reasonable, but worth a one-liner in docs to avoid surprises.def decode_key(self, value: t.Optional[str], charset: t.Optional[Charset] = None) -> t.Optional[str]: - """Decode a key (or key segment). Always returns a string or ``None``.""" + """Decode a key (or key segment). Always returns a string or ``None``. + Note: custom decoders returning non-strings for keys are coerced via ``str()``.""" out = self.decode(value, charset, kind=DecodeKind.KEY) return None if out is None else str(out)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/qs_codec/decode.py(4 hunks)src/qs_codec/models/decode_options.py(4 hunks)src/qs_codec/utils/decode_utils.py(7 hunks)tests/unit/decode_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/qs_codec/decode.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/qs_codec/utils/decode_utils.py (1)
src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
src/qs_codec/models/decode_options.py (4)
src/qs_codec/utils/decode_utils.py (2)
DecodeUtils(18-245)decode(142-175)src/qs_codec/decode.py (1)
decode(31-101)src/qs_codec/enums/charset.py (1)
Charset(25-36)src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
tests/unit/decode_test.py (5)
src/qs_codec/models/decode_options.py (2)
DecodeOptions(20-290)decode(267-281)src/qs_codec/decode.py (1)
decode(31-101)src/qs_codec/utils/decode_utils.py (1)
decode(142-175)src/qs_codec/enums/charset.py (1)
Charset(25-36)src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
🔇 Additional comments (4)
src/qs_codec/utils/decode_utils.py (1)
32-107: Top-level dot scanner: solid O(n) pass with well-scoped splitting; preserves bracket depth and encoded dots
The character-scanner correctly avoids splitting inside brackets, handles degenerate cases, and never treats percent-encoded dots as separators. Clear improvement over regex.tests/unit/decode_test.py (1)
1554-1561: Nice fix for F601 duplicate dict key in no-op assertion
The bool(decode(...)) approach avoids constructing a dict with duplicate keys while still exercising the code path.src/qs_codec/models/decode_options.py (2)
27-35: Docstring implies decode_dot_in_keys governs top-level %2E splitting; verify against actual pipeline
Given DecodeUtils.decode treats keys and values identically, whether a top-level %2E participates in splitting depends on where decoding occurs relative to dot-splitting. Please make the contract explicit here:
- If keys are decoded before splitting, then allow_dots alone causes decoded %2E→"." to split; decode_dot_in_keys may be redundant.
- If decode_dot_in_keys is the only switch that enables top-level %2E splitting, ensure keys are not pre-decoded into literal '.' before calling the top-level scanner.
I can propose precise wording after you confirm the intended order of operations (decode vs. split).
155-166: Signature-adaptive decoder adapter: robust and pragmatic
The adapter correctly handles (s), (s, charset), and (s, charset, kind) forms, including varargs/kwargs and Enum-vs-str kind representations. Nice touch using Enum base to decide string coercion.Also applies to: 220-263
… encoded dots in dot_to_bracket_top_level
…et groups in decode logic
…der outputs via str()
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
src/qs_codec/utils/decode_utils.py (7)
80-111: Leading “..” currently produces an empty “[]” segment; either document or preserve the first dotFor inputs like “..a”, the branch for a leading dot builds an empty bracket segment “[]” before proceeding. That may be surprising given the documented degenerate rule focuses on “a..b → a.[b]”. If the intended rule is “preserve the first dot” even at the start, consider this small guard; otherwise, please add a test and doc note for “..a” explicitly.
elif i == 0: + # If input starts with '..', preserve the first dot like the 'a..b' case. + if has_next and next_ch == ".": + sb.append(".") + i += 1 + continue # leading '.' starts a bracket segment: ".a" -> "[a]" start = i + 1 j = start while j < n and s[j] != "." and s[j] != "[": j += 1Examples to cover in tests: ".a", "..a", "...", ".[b]".
112-119: Comment mentions preserving percent sequences; code already does so implicitlyThe implementation doesn’t special‑case “%2E” (it just appends chars), which is fine. The comment is correct but could be tightened to “We only split on literal '.'; '%' sequences are never treated as separators.” Optional.
184-185: Latin‑1 fast path is correct; minor micro‑opt optionalCurrent regex+lambda is clear and fine. If you ever need to squeeze a bit more performance, replacing the lambda with a small local fast converter (avoids repeated int/chr in a closure) can help, but not necessary now.
199-205: Docs: clarify remainder-wrapping example for max_depth and unterminated groupsThe behavior is sound. Suggest adding an explicit example so users know the remainder is bracket‑wrapped as a single synthetic segment, e.g., 'a[b][c][d]' with max_depth=2 → ['a','[b]','[c]','[[d]]']; 'a[b' (unterminated) → ['a','[[b]'].
- If there are more groups beyond ``max_depth`` and ``strict_depth`` is True, an ``IndexError`` is raised. Otherwise, the remainder is added as one final segment (again mirroring qs). - - Unterminated '[': the remainder after the first unmatched '[' is captured as a single synthetic bracket segment (qs/Kotlin parity). + - Unterminated '[': the remainder after the first unmatched '[' is captured as a single synthetic bracket segment (qs/Kotlin parity). + + Examples + -------- + max_depth=2: "a[b][c][d]" -> ["a", "[b]", "[c]", "[[d]]"] + unterminated: "a[b" -> ["a", "[[b]"]
225-247: Unterminated bracket handling is correct and avoids strict_depth false positivesThe separate ‘unterminated’ flag prevents raising for malformed input under strict_depth—good parity detail. Consider adding unit tests that assert no exception is raised for unterminated input when strict_depth=True.
253-261: Depth overflow vs unterminated remainder: confirm intended double-bracket remainderAppending "[" + key[open_idx:] + "]" intentionally creates a synthetic “[ … ]” segment whose content contains the original bracket tokens, e.g., remainder “[d][e]” → “[[d][e]]”. That matches the doc comment; just ensure downstream consumers treat “[[…]]” as an opaque segment. If not, you may need a sentinel wrapper.
I can help add tests to lock this in:
- allow_dots=True, max_depth=1: "a.b.c" → ["a", "[b][c]"] vs ["a","[[b][c]]"] depending on consumer expectations.
- strict_depth=True overflow raises for well-formed keys, but not for unterminated.
154-166: Documentkindas a no-op indecode()and refreshDecodeKinddocsTo keep API compatibility clear and align cross-file documentation:
- In src/qs_codec/utils/decode_utils.py (lines 154–166), extend the
decodedocstring to note that thekindparameter is accepted but ignored, and may be removed in a future major release.- In src/qs_codec/enums/decode_kind.py (lines 18–22), update the
KEYattribute doc to reflect that the default scalar decoder ignoreskindand fully decodes percent-encoded dots; dot-splitting semantics run later via parser options.Suggested diffs:
--- a/src/qs_codec/utils/decode_utils.py +++ b/src/qs_codec/utils/decode_utils.py @@ def decode( - ) -> t.Optional[str]: - """Decode a URL-encoded scalar. + ) -> t.Optional[str]: + """Decode a URL-encoded scalar. + + Notes + ----- + The `kind` parameter is accepted for API compatibility but is currently + ignored; keys and values are decoded identically. It may be removed in + a future major release.--- a/src/qs_codec/enums/decode_kind.py +++ b/src/qs_codec/enums/decode_kind.py @@ class DecodeKind(str, Enum): - KEY - Decode a *key* (or key segment). Implementations typically preserve - percent-encoded dots (``%2E``/``%2e``) so that dot-splitting semantics can - be applied later according to parser options. + KEY + Decode a *key* (or key segment). Note that the default scalar decoder + (``qs_codec.utils.decode_utils.decode``) ignores `kind` and fully + decodes percent-encoded dots (``%2E``/``%2e``). Dot-splitting behavior is + applied later by higher-level parser options.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/qs_codec/models/decode_options.py(4 hunks)src/qs_codec/utils/decode_utils.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/qs_codec/models/decode_options.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/qs_codec/utils/decode_utils.py (1)
src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-27)
🔇 Additional comments (4)
src/qs_codec/utils/decode_utils.py (4)
7-8: Clarify percent-encoded dot semantics in module docstringThe bullet claims “never splits on percent‑encoded dots,” but the default decode path later normalizes “%2E” → “.”, which will be split at top level when allow_dots=True. Recommend updating the wording to reflect: only literal “.” at depth 0 are split; encoded “%2E” will be split iff the upstream decoder normalized them.
Apply this doc tweak:
- - Top‑level dot splitting uses a character‑scanner that handles degenerate cases (leading '.' starts a bracket segment; '.[' is skipped; double dots preserve the first; trailing '.' is preserved) and never splits on percent‑encoded dots. + - Top‑level dot splitting uses a character‑scanner that handles degenerate cases (leading '.' starts a bracket segment; '.[' is skipped; double dots preserve the first; trailing '.' is preserved). It splits only on literal '.' at depth 0. If the upstream decoder normalized '%2E' to '.', those will be split; if a custom decoder leaves '%2E' encoded, they will not split.
32-47: dot_to_bracket_top_level: tighten language about when decoding happensGood API addition. Minor doc refinement: the opening line suggests this function itself runs “after percent‑decoding.” To avoid implying it decodes, prefer “after upstream decoding.”
[ suggest_nitpick ]- """Convert top-level dot segments into bracket groups *after* percent-decoding. + """Convert top-level dot segments into bracket groups (assuming upstream decoding has already run).Also, the Notes already clarify the default path vs custom decoder path—nice. No functional concerns here.
208-212: Returning the original key when max_depth <= 0 matches qs semanticsGood change. This preserves the original token when depth-based splitting is disabled.
36-45: Behavior vs PR summary: encoded dot splittingPR summary states “ensures percent‑encoded dots are never split at the top level,” while the doc here (correctly) notes default decoding turns %2E into '.' which then will split. Please confirm the intended behavior. If the invariant “never split encoded dots” must hold for the default decoder, decode() would need to preserve %2E in keys again (or pass DecodeKind.KEY and branch), which contradicts the unification goal. Otherwise, update PR text/docs accordingly.
…d unterminated bracket handling
…be removed in future
… depth enforcement
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/unit/decode_test.py (1)
1554-1561: Resolved: duplicate dict key (F601) workaround looks good
This replaces the problematic dead-branch assertion with a no-op truthiness assertion to force invocation without constructing duplicate keys. Matches prior guidance.
🧹 Nitpick comments (7)
src/qs_codec/utils/decode_utils.py (4)
80-106: Edge-case coverage: add tests for leading double dot and dot-before-bracket at startYou already cover a..b, a.[b], and trailing dots. Consider adding:
- "..a=x" with allow_dots=True → {"..a": "x"} or {".": {"a": "x"}} depending on intended policy; your current logic preserves the first dot and turns the second into a bracket, yielding ".[a]" which ultimately becomes {".": {"a": "x"}}.
- ".[a]=x" with allow_dots=True → {"a": "x"} (you skip the dot before '['; worth an explicit test).
I can add parametrized tests mirroring these if you want.
118-124: Comment nit: “preserve percent sequences verbatim” is accurate but implementation is byte-wiseThe comment says percent sequences are preserved “verbatim,” which is true; however, the code appends characters byte-by-byte without special handling for '%' sequences. Consider clarifying the comment to “no special handling for percent sequences here; characters are appended as-is.”
- # also preserve percent sequences verbatim at top level; - # we don't split on '%2E' here + # No special handling for percent sequences here; characters are appended as-is. + # We never split on '%2E' at this stage.
190-196: Latin-1 fast-path is fine; consider a tiny micro-optimizationThe
HEX2_PATTERN.sub(lambda ...)is straightforward. If you care about micro-opts, prebindintandchrlocally to avoid global lookups in the hot loop, but this is purely optional.- return cls.HEX2_PATTERN.sub(lambda m: chr(int(m.group(1), 16)), s) + _int, _chr = int, chr + return cls.HEX2_PATTERN.sub(lambda m: _chr(_int(m.group(1), 16)), s)
241-276: Remainder handling and strict-depth logic look correct; add one extra malformed-key test
unterminated=Trueavoids false positives for depth overflow.- The synthetic remainder via
"[" + key[open_idx:] + "]"yields the intended[[...]].Consider adding a test for a malformed sequence that includes ‘]’ inside the token captured by the scanner-generated group (e.g., “a.b]c” with allow_dots=True) to ensure downstream consumers tolerate odd literals within the synthetic segment.
tests/unit/decode_test.py (3)
1530-1533: Trailing dot behavior: confirm intentional drop vs literal retentionThe expectation that “user.email.” with allow_dots=True yields {"user": {"email": "x"}} implies the trailing '.' is intentionally ignored. This matches the scanner’s behavior; consider adding one counterpart asserting that with allow_dots=False, the trailing '.' is retained as a literal key (“user.email.”), to document both modes.
1563-1622: Guardrail and parity tests are strong; consider minor parameterization to reduce duplicationThe “additional parity” suite is thorough. You could reduce repetition by parameterizing case variants (
%2Evs%2e) and option tuples, but this is optional.
1456-1461: Clarify intent with bracket-internal %2E decoding commentsTests already assert that %2E inside brackets decodes to '.' independent of decode_dot_in_keys. Consider a brief comment on the option being top-level only, to avoid readers assuming it affects bracket segments.
- # Note: key-decoder percent-decodes inside brackets, so %2E → "." + # Note: percent-decoding inside bracket segments yields '.' regardless of decode_dot_in_keys, + # which only influences top-level splitting semantics.Also applies to: 1470-1474, 1475-1484
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/qs_codec/enums/decode_kind.py(1 hunks)src/qs_codec/utils/decode_utils.py(7 hunks)tests/unit/decode_test.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/qs_codec/enums/decode_kind.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/qs_codec/utils/decode_utils.py (1)
src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-28)
tests/unit/decode_test.py (5)
src/qs_codec/models/decode_options.py (2)
DecodeOptions(20-293)decode(267-281)src/qs_codec/utils/decode_utils.py (3)
decode(159-198)DecodeUtils(18-278)split_key_into_segments(201-278)src/qs_codec/decode.py (1)
decode(31-101)src/qs_codec/enums/charset.py (1)
Charset(25-36)src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-28)
🔇 Additional comments (8)
src/qs_codec/utils/decode_utils.py (5)
32-124: Top-level dot scanner is correct, linear, and handles degenerate cases wellThe stateful scanner correctly:
- respects depth (splits only at depth 0),
- preserves dots inside brackets,
- skips ".[" (a.[b] → a[b]),
- preserves the first of a double dot (a..b → a.[b]),
- preserves trailing dots.
The implementation is linear-time and avoids regex backtracking. Nice work.
163-178: Unified key/value decoding: docstring and signature match behaviorAccepting
kindbut ignoring it (with pylint suppression) is consistent with the updated design and theDecodeKinddocs. Clear and future-proof.
210-216: split_key_into_segments doc clarifies dot-first policy and Kotlin/qs parityThe updated doc precisely communicates the top-level dot conversion and the synthetic segment behavior. Good alignment with tests.
224-228: depth=0 early return correctly bypasses dot conversionReturning
[original_key]before callingdot_to_bracket_top_levelensures “depth=0 means no splitting whatsoever,” which your tests assert. Good.
1-8: Sanity check complete: no lingering references toDOT_TO_BRACKET
I ran a repository-wide search forDOT_TO_BRACKET(including import statements) and found no matches. All references have been removed.tests/unit/decode_test.py (3)
1435-1512: Encoded-dot behavior tests align with pre-decoding + top-level splittingThese cases establish:
- %2E splits at top level when allow_dots=True regardless of decode_dot_in_keys.
- Inconsistent options raise (decode_dot_in_keys=True while allow_dots=False).
- Inside brackets, %2E decodes to '.' but does not trigger additional splits.
This matches the new contract. Well done.
1513-1545: Compositional cases (bracket + encoded dot, encoded dot before bracket) are accurate and valuableGreat coverage for “a[b]%2Ec=x”, “a%2E[b]=x”, and case variants. This guards against regressions in the scanner’s lookahead logic.
1624-1649: Remainder segmentation tests validate the synthetic ‘[[...]]’ contractThese assert the nuanced behaviors (depth slicing for dot and bracket inputs, strict vs non-strict, unterminated group). Excellent.
…rcent sequence handling
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/qs_codec/utils/decode_utils.py (4)
7-7: Clarify wording around percent-encoded dots to avoid ambiguity.“never splits on percent-encoded dots” can be misread given the default decoder fully percent-decodes first. Suggest clarifying that only literal '.' characters at depth 0 are split, while literal percent sequences like '%2E' are never treated as split points.
Apply this doc tweak:
- - Top‑level dot splitting uses a character‑scanner that handles degenerate cases (leading '.' starts a bracket segment; '.[' is skipped; double dots preserve the first; trailing '.' is preserved) and never splits on percent‑encoded dots. + - Top‑level dot splitting uses a character‑scanner that handles degenerate cases (leading '.' starts a bracket segment; '.[' is skipped; double dots preserve the first; trailing '.' is preserved) and never treats literal percent‑encoded sequences (e.g., '%2E') as split points; only actual '.' characters at depth 0 are split.
94-116: Reduce duplication in token scanning logic.The “scan until next '.' or '[' (and now ']')” block appears twice (leading '.' case and normal split). Extract a small local helper to improve readability and reduce maintenance overhead.
Example refactor:
+ def _scan_token(start_idx: int) -> int: + j = start_idx + while j < n and s[j] not in ".[]": + j += 1 + return j @@ - j = start - while j < n and s[j] != "." and s[j] != "[": - j += 1 + j = _scan_token(start) @@ - j = start - while j < n and s[j] != "." and s[j] != "[": - j += 1 + j = _scan_token(start)
195-197: Optional: Considerunquote_to_bytesfor the Latin‑1 path.For long strings this regex+lambda is fine, but
urllib.parse.unquote_to_bytesfollowed by.decode('latin-1')is a readable, reasonably fast alternative that naturally ignores%uXXXX.Illustrative change:
- _int, _chr = int, chr - return cls.HEX2_PATTERN.sub(lambda m: _chr(_int(m.group(1), 16)), s) + from urllib.parse import unquote_to_bytes + return unquote_to_bytes(s).decode('latin-1')Note: Both behaviors leave
%uXXXXuntouched. Keep the current approach if benchmarks favor it in your workloads.
242-279: Good treatment of unterminated groups and strict-depth overflow.
- Using
unterminatedto suppress strict overflow exceptions is subtle but correct.- Remainder wrapping into a single synthetic segment ensures downstream folding can proceed deterministically.
Nit: consider renaming variables to reduce cognitive load:
level->group_balancedepth->segments_emitted
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/qs_codec/utils/decode_utils.py(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/qs_codec/utils/decode_utils.py (1)
src/qs_codec/enums/decode_kind.py (1)
DecodeKind(13-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (3)
src/qs_codec/utils/decode_utils.py (3)
32-62: API addition looks solid and the docstring is thorough.The new top-level dot scanner API and its behavior notes are clear and align with qs semantics. Good call documenting degenerate cases and custom-decoder behavior.
211-217: Key-splitting spec is clear and matches qs/Kotlin parity.The top-level dot conversion plus balanced-bracket segmentation and the unterminated-bracket behavior are well-specified in the docstring.
225-229: Correct handling formax_depth <= 0.Returning the original key as a single segment is the right qs-aligned behavior. Good defensive early-exit.
…verrun on closing brackets
… percent-decoded dots in keys
…' at depth 0, not percent-encoded sequences
This pull request refines query string decoding logic, focusing on how percent-encoded dots (
%2E/%2e) and list limits are handled, and improves compatibility with custom and legacy decoders. The changes clarify documentation, enhance top-level dot splitting, and add convenience methods for decoding keys and values. The most significant changes are grouped below:Dot Decoding and Key Splitting Improvements:
dot_to_bracket_top_levelmethod inDecodeUtilsto convert top-level dots in keys to bracket groups, preserving dots inside brackets and handling edge cases. This replaces the previous regex-based approach and ensures percent-encoded dots are never split at the top level.split_key_into_segments) to use the new character-scanner for top-level dots, ensuring correct handling of degenerate cases and parity with other language ports. Unterminated brackets now produce a synthetic segment matching Kotlin/qs behavior. [1] [2] [3]Percent-Encoding and Decoder Behavior:
.acts as a key separator is determined by parsing options, not by decoding. The previous logic that preserved%2Ein keys has been removed for consistency. [1] [2]DecodeOptionsand decode-related methods has been clarified to reflect these behaviors and the role ofdecode_dot_in_keysfor top-level dot splitting only. [1] [2]Decoder Configuration and Compatibility:
legacy_decoderoption inDecodeOptions, enabling back-compatibility with older two-argument decoders. The decoder precedence is now: customdecoder>legacy_decoder> library default. [1] [2]decode,decode_key, anddecode_valuetoDecodeOptionsfor unified scalar decoding, mirroring Kotlin API parity.List Limit and Parsing Semantics:
list_limitvalues, clarifying that numeric-index parsing is disabled but comma-splitting still returns lists, and that list-growth operations raise immediately whenraise_on_limit_exceeded=True. [1] [2]General Documentation and Clarity:
These changes together improve the correctness, configurability, and maintainability of the query string decoding logic.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests