Skip to content

Misc Security fixes#2820

Merged
dgageot merged 6 commits into
docker:mainfrom
dgageot:board/a06d029e834182e5
May 20, 2026
Merged

Misc Security fixes#2820
dgageot merged 6 commits into
docker:mainfrom
dgageot:board/a06d029e834182e5

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 19, 2026

No description provided.

@dgageot dgageot requested a review from a team as a code owner May 19, 2026 11:05
docker-agent

This comment was marked as outdated.

dev and others added 6 commits May 20, 2026 11:01
The skills cache fetches from a base URL that may be operator-supplied,
and the contents are fed to the model as instructions. Using the plain
remote transport allowed any IP — including 127.0.0.1, RFC1918, or
169.254.169.254 (cloud metadata) — to be reached, turning a hostile
registry into a metadata-exfiltration vector via prompt injection.

Switch to httpclient.NewSafeClient so non-public addresses are refused
at dial time (after DNS resolution, defeating DNS rebinding). Tests
override the client via TestMain since httptest.NewServer binds to
127.0.0.1.
Replace direct == comparisons with errors.Is so wrapped errors are
matched correctly, and replace deprecated os.IsNotExist with
errors.Is(err, fs.ErrNotExist). For sql.ErrNoRows the existing == form
worked because database/sql.Scan is documented to return it unwrapped
(and golangci-lint's errorlint allowlists this pattern), but using
errors.Is is consistent with the rest of these files (e.g.
bm25_database.go already used errors.Is at line 100) and makes the
code resilient if a future caller wraps the error.

Note: the project already enables errorlint via .golangci.yml, but
errorlint intentionally does not flag these specific patterns
(sql.Scan -> sql.ErrNoRows is allowlisted; os.IsNotExist is not in
errorlint's scope). No linter change is therefore needed.
…ndler

The callback server published the OAuth result via three size-1 channels
with blocking sends. Any additional hit on /callback — a stale browser
tab, the user clicking back then re-submitting, or any local process on
the loopback interface poking the port — would wedge the HTTP handler
goroutine forever (until Shutdown), leaking goroutines and TCP
connections per stray request, and giving a local attacker a trivial
way to pin those resources.

Collapse codeCh/stateCh/errCh into a single buffered resultCh and use
non-blocking sends via a deliver helper. The first callback wins;
later ones are dropped on the floor. WaitForCallback no longer has the
two-channel race where a context cancellation between codeCh and
stateCh reads would lose the code.

Add TestCallbackServer_DuplicateCallbacksDoNotBlock which fires five
back-to-back callbacks and verifies (a) every HTTP response completes
within the timeout and (b) the first one is still delivered to the
waiter.
The skills cache previously persisted every response to disk, even when
the upstream marked the body as Cache-Control: no-store. Since the
cached content is fed to the model as instructions, this was both a
privacy hazard (sensitive content lingering under ~/.cagent) and a
violation of RFC 9111 \u00a75.2.2.5 ("the cache MUST NOT store any part
of either the immediate request or response").

no-cache was also conflated with max-age=0 \u2014 it actually means
"may be stored, but must be revalidated before reuse" (RFC 9111
\u00a75.2.2.4). For the skills cache, which has no conditional-GET
support yet, the safe approximation is to store but report a miss on
the next Get, forcing a fresh fetch.

- Replace parseCacheExpiry with parseCacheControl returning a structured
  cacheDirective.
- FetchAndStore now skips the disk write entirely on no-store and
  returns the body in-memory.
- Get treats fs.ErrNotExist as a clean miss, but logs other errors
  (corrupt JSON, EACCES, \u2026) at debug instead of silently masking them
  as a benign refetch trigger.
- Add tests for no-store, no-cache, and the directive precedence.
Self-review caught a regression in the previous commit. Skipping the
disk write entirely on Cache-Control: no-store made remote skills with
that header unreadable, because the consumer at pkg/tools/builtin/skills
reads skill.FilePath directly rather than going through diskCache.Get.
After prefetchFiles returned (silently happy), readFileContent(...)
hit ENOENT and the skill was effectively broken.

Restore the disk write but keep no-store distinct from a normal
response: the entry's metadata expires immediately, so the next Load()
cycle refetches rather than reusing the stored copy. This is no longer
strict RFC 9111 \u00a75.2.2.5 ("the cache MUST NOT store any part"),
which is documented in the function comment along with the future
direction (in-memory cache shared with the reader).

Update TestDiskCache_NoStoreStoresButExpiresImmediately accordingly:
the file must now exist on disk, but Get must still report a miss so
prefetch refetches on the next Load.
…isleading 200

Addresses review feedback on the previous commit (drop stray OAuth
callbacks instead of blocking the handler). Even though duplicate
callbacks no longer wedge the handler goroutine, they were still
receiving an 'Authorization Successful!' page even when their result
was silently discarded by deliver(). This was misleading for users
who refresh a stale browser tab, and gave a local attacker a way to
probe whether a flow is in progress (200 vs 400 on a known nonce).

deliver() now returns whether it actually delivered the result. The
callback handler responds with HTTP 409 Conflict and an 'Authorization
Already Processed' page when deliver() returns false, except for the
state-mismatch path where the response stays unchanged so it doesn't
leak whether a flow is currently active.

TestCallbackServer_DuplicateCallbacksDoNotBlock now also verifies
that the second through fifth callbacks return HTTP 409 rather than
HTTP 200.
@dgageot dgageot force-pushed the board/a06d029e834182e5 branch from c68ce71 to 4f57a88 Compare May 20, 2026 09:07
@dgageot dgageot merged commit c3bb03f into docker:main May 20, 2026
5 checks passed
@aheritier aheritier added area/security Authentication, authorization, secrets, vulnerabilities kind/security Security fix or hardening PR labels May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/security Authentication, authorization, secrets, vulnerabilities kind/security Security fix or hardening PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants