Skip to content

Code refactoring and technical improvements #37

@EdmondDantes

Description

@EdmondDantes

Code audit & refactoring

A systematic, step-by-step audit of the entire codebase to find and fix technical debt. This is not a single PR — work proceeds incrementally, one area at a time, with each step verified independently (full phpt run, no behavior change, no regression).

Scope note: HTTP/3 is out of scope for now — src/http3/ excluded.

Baseline: 195/197 phpt pass (034 cosmetic curl-deprecation, 044 pre-existing SEGV in php-async closure transfer — both unrelated). Bar for every step: no new failures.

Audit dimensions

  1. Unnecessary memory allocations · 2. Suboptimal syscall usage · 3. Structural cleanliness · 4. Code duplication · 5. SOLID violations · 6. Data structure optimality · 7. Algorithmic optimality · 8. Comment quality (final pass).

Findings & implementation plan

Ordered: correctness → dead code → low-risk perf → duplication → big-function splits → data structures → comments. Dedup precedes splits (big functions are often big because of duplication).

Phase 1 — Correctness ✅ (commit 6d470a6)

  • C1 http_parser.cpaused not reset in http_parser_reset_for_reuse. Verified: not an active bug — the sole consumer (http_parser_execute) clears it and llhttp_resume no-ops on a freshly-init'd parser. Latent inconsistency only → folded into X2 (teardown dedup makes divergence impossible).
  • C2 send_file.c — engine open lacked O_NOFOLLOW; on an open-file-cache hit (lstat pre-flight skipped) a symlink swapped in within the TTL was followed, leaking files outside the docroot. Fixed in 6d470a6 + regression test static/021.

Phase 2 — Dead code / trivial removals ✅ (commit cb57c23)

  • D1 http2_static_response.c:296 — dead n counter (computed, never read).
  • D2 http_static.c:300/310http_static_cache_acquire called twice; hoist to one.
  • D3 http_static.c:347-353 — dead cache != NULL re-checks (implied by prefer_inline).
  • D4 http_compression_response.c:48-49 — dead struct fields underlying_ops/underlying_ctx.
  • D5 http_compression_negotiate.h:43 — stale doc comment (preference order mismatch).
  • D6 http_server.c:284 — dead can_reuse_string_buffer (static inline, never called).
  • D7 tls_layer.c:352 — dead pre-branch mac_params[1] write.
  • D8 http_response_internal.h:41 — duplicate accessor declarations. Done in 4.6 (commit 1e851f3): dropped the get_status_code/get_headers_table aliases, callers renamed to the public get_status/get_headers, removed internal-header re-declarations.

Phase 3 — Low-risk performance ✅ (commit 7b7de1e) — P1/P2/P6 done; P3/P7/P8 deferred, P4 dropped, P5 done in Phase 4

  • P1 http2_session.c:1403h2_stream_pending_bytes rescans the chunk ring O(N) per data-provider tick; use the already-maintained chunk_queue_bytes.
  • P2 http_compression_response.c:407 — gzip buffered path undersizes output (+64); use +body_len/1000+....
  • P3split out as a separate issue — feature work (negative-cache integration for directory-index stats), needs its own test; out of scope for this refactor branch.
    http_static.c:227 — directory-index resolution does N blocking stats bypassing the negative cache.
  • P4dropped — code already uses zend_hash_str_find (no alloc); ~15ns gain not worth a new parser field
    http_parser.c:552 — redundant connection hash lookup in on_headers_complete.
  • P5 → done in 4.5 (commit 223ee83): folded into Phase 4 — snprintf→shared-digit-writer dedup.
    http2_session.c:1654/1751snprintf for :status; reuse the digit writer.
  • P6 http_request.c:380/396getContentType/getContentLength alloc a zend_string for a literal header name.
  • P7split out as a separate issue — real anti-pattern but a 5-file signature change, structural not perf; out of scope for this refactor branch.
    http_request.c:704 / uploaded_file.c:402emalloc(sizeof(zval)) wrapper alloc/free per request/upload.
  • P8won't-do — start()-only (one-time, not a hot path); the ~25 VM re-entries cost nothing measurable. A critical-function refactor for zero gain.
    http_server.c start() — ~25 PHP-VM re-entries to read scalar config already present in the C struct.

Phase 4 — Duplication ✅ (4.1 4b35fc8 X2/X9 · 4.2 c90fd38 X1 · 4.3 3ba412b X3/X4 · 4.4 d075b51 X14 · 4.5 223ee83/bf1351a/a67e5d6/289a583 P5/X8/X6/X7 · 4.6 1e851f3 X10/X11/D8 · 4.7 e859186 X13)

  • X1 compression — encoder drain-loop copy-pasted ×4 → encoder_drain_write/finish.
  • X2 http_parser.creset/destroy/reset_for_reuse teardown ×3 → parser_teardown_common.
  • X3 core — 3 coalesce-into-pending blocks → out_pending_append. (4.3 3ba412b)
  • X4 core — 3 send-completion epilogues → http_send_batched_finish. (4.3 3ba412b)
  • X5recommend skip — both fns differ in log-level AND tag, must log before zend_clear_exception; dedup yields an awkward 4-param helper
    core — tls_/http_absorb_io_submission_exception identical → one helper.
  • X6 http2 — header-flatten → h2_flatten_response_headers (two two-pass blocks; the audit's third was a miscount). (4.5 a67e5d6)
  • X7 http2 — h2_emit_streaming_body vs h2_dp_streaming_copy ring-walk → h2_chunk_queue_walk + slice sink. (4.5 289a583)
  • X8 http2 — h2_stream_dispose_tail repeated ×3. (4.5 bf1351a)
  • X9 http_parser.cextract_boundary two alloc tails.
  • X10 http_response.cemit_headers_only gained a skip_framing flag; the chunked formatter reuses it. (4.6 1e851f3)
  • X11 http_response.c — 3 reset-body paths → response_set_body_bytes. (4.6 1e851f3)
  • X12 static — dedup of the 200/304 header emit shared by the cache-hit fast path and the sync fallback inside http_static_try_serve, via in-TU static_emit_not_modified / static_emit_ok_headers / static_emit_validators helpers. The original deferral feared a 9-param cross-TU helper; in practice both paths live in the same TU and share a clean data model once the content-type is resolved locally. (commit d9d6580)
  • X13 static — cache-disabled guard (audit said ×3, actually ×5) → cache_disabled predicate. (4.7 e859186)
  • X14 tls_layer.cticket_key_cb MAC-params construction ×2 → tls_ticket_mac_set_key. (4.4 d075b51)

Phase 5 — Big-function splits (SOLID) — ❌ won't-do (S1–S6)

Reassessed. Blanket "split a function for SRP" in C systems code is cargo-cult here. S1–S6 are linear, single-caller pipelines already organised with section comments; cutting them into one-call helpers threaded through a god-struct reads worse, not better, and adds churn with no correctness or perf gain. S1–S6 dropped. Genuine duplication inside them was already removed in Phase 4 (X12 closed the last item). S7 is a real build-hygiene win but a separately-scoped change → split out as its own issue.

  • S1won't-do — see Phase 5 note. engine_handle_stat (~275 lines).
  • S2won't-do — see Phase 5 note. http_static_try_serve (~500 lines).
  • S3won't-do — see Phase 5 note. http2_feed (~185 lines).
  • S4won't-do — see Phase 5 note. h2_stream_send_static_response (~215 lines).
  • S5won't-do — see Phase 5 note. http_connection_dispatch_request (~160 lines).
  • S6won't-do — see Phase 5 note. http_connection_destroy (~175 lines).
  • S7split out as a separate issuehttp_response.c (2194 lines): split static-handler C setters + formatters into separate TUs. Genuine build-hygiene win, separately scoped.

Phase 6 — Data structures

  • DS1won't-do — verified in code. Of the 5 audit-flagged http_request_t fields, body_h2_session + body_h2_stream_id are now live (H2 backpressure nghttp2_session_consume wired in http_body_stream.c). The remainder (body_h2_consume_pending, body_h3_stream, body_paused, macro HTTP_BODY_QUEUE_WATERMARK) are documented placeholders for the in-flight backpressure follow-up PR — removing them is churn the follow-up re-adds. The "~28 bytes" estimate was already stale.
  • DS2 http2_emit_record_t.body.len is uint32_tZEND_ASSERT(len <= UINT32_MAX) added in h2_emit_state_append_body. H2 DATA slices are frame-size bounded so the cast never truncates today; the assert guards a future caller. (commit 972eb4c)
  • DS3won't-do — a union trades a couple of bytes for fragility: the mutual exclusion is not enforced by the type, and a future edit could silently alias a live field.
    engine_state_t — mutually-exclusive error-defer/range fields could share a union.
  • DS4 chunk_queue — compacting array; 3 comments that called it a "ring" corrected. (commit 972eb4c)

Phase 7 — Comment quality

  • CM1won't-do as a dedicated pass — a blanket comment sweep is low-value churn. Comment quality is handled opportunistically inside every touched commit instead, which is what Phases 1–6 already did (CODING_STANDARDS short 1-2 line comments enforced per-edit).

Other (low priority / measure first) — separate-issue candidates

These four are out of scope for this refactor branch — candidates for their own issues if later benchmarking justifies them.

  • O1 http1_stream.c — 3 sends per streaming chunk (medium risk; vectored write).
  • O2 http_body_stream.c — per-chunk node emalloc.
  • O3 http_connection_tls.cread_buffer never shrinks after a large burst.
  • O4 http_parser.cstrncasecmp prefix-match for TE/Connection tokens (token-aware compare).

Acceptance

  • No behavior change; phpt suite green (current baseline 196/198; 034/044 pre-existing, unrelated — no new failures).
  • No benchmark regression against the release baseline (perf-affecting phases benchmarked).
  • Comment quality enforced per-commit (CM1 dedicated pass dropped — see Phase 7).

Status — branch ready to merge

Phases 1–4 plus the remaining actionable items (X12, DS2, DS4) are done and pushed to 37-code-refactoring-and-technical-improvements. Everything else is resolved as won't-do (S1–S6, P8, DS1, DS3, CM1 — each with rationale above) or deferred to a separate issue (S7, P3, P7, O1–O4). No open work remains on this branch.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions