Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,37 @@
# Build artifacts & VCS
.git
target
node_modules

# Documentation (not needed in image)
*.md
docs/
tests/
.github/
helm/

# Codeiq workspace under src/codeiq/ (development scratchpad)
src/codeiq/

# Secrets — explicit defense-in-depth; .dockerignore does NOT inherit
# .gitignore (Docker resolves COPY against the build context, which
# includes uncommitted/working-tree files). Audit RAN-46 §3.
.env
.env.*
*.pem
*.key
*.jks
*.p12
*.pfx
*.keystore
id_rsa
id_ecdsa
id_ed25519
id_dsa
credentials.json
credentials.yaml
secrets.json
secrets.yaml
*.serviceaccount.json
.aws/
.codeiq/
9 changes: 7 additions & 2 deletions .github/workflows/security.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,13 @@ jobs:
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.12'
- name: Install semgrep
run: python -m pip install --quiet --upgrade pip semgrep
- name: Install semgrep (pinned for reproducibility)
# Pinned per OpenSSF Scorecard `Pinned-Dependencies` (RAN-46 §5).
# Bump via Dependabot pip ecosystem on a documented cadence; floating
# `semgrep` was previously flagged by Scorecard. pip is left unpinned
# — setup-python@v6 ships a current vendored pip, and the Scorecard
# rule fires only on user-installed packages.
run: python -m pip install --quiet 'semgrep==1.161.0'
- name: Run semgrep (security-audit + owasp-top-ten + java)
run: |
semgrep scan \
Expand Down
26 changes: 25 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,34 @@ Thumbs.db
*.mv.db

# Environment & secrets
# Broad .env* glob catches .env, .env.local, .env.prod, .env.test, .env.* — all
# variants. Pre-PR-3 we only excluded the first two and several .env.<env>
# variants would have committed silently.
.env
.env.local
.env.*
# Java keystores & PKCS#12 archives — high-value secrets that have shown up in
# audits; never commit, even encrypted.
*.jks
*.p12
*.pfx
*.keystore
# Generic credential / private-key patterns
*.pem
*.key
# SSH private keys (public *.pub keys are fine).
id_rsa
id_ecdsa
id_ed25519
id_dsa
# AWS / cloud credentials
.aws/credentials
credentials.json
credentials.yaml
secrets.json
secrets.yaml
# Service-account JSON (GCP / Firebase) — typically named *.serviceaccount.json.
*-serviceaccount.json
*.serviceaccount.json

# Logs
*.log
Expand Down
48 changes: 48 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,54 @@ for that specific tag for the per-commit details.
topology tool as a targeted Cypher query so the snapshot isn't needed.
The cache is the bridge; the rewrite reduces peak memory.

- **Production-readiness PR 3 of 5 — supply chain & bundle integrity.**
Closes the air-gap drift, missing bundle integrity, and unpinned
scanner versions audit findings.
- **`codeiq bundle` SHA-256 manifest.** Every entry in `bundle.zip`
(manifest, scripts, graph DB files, H2 cache, source tree, flow.html,
optional CLI JAR) is now hashed as it streams through the
`ZipOutputStream`, and a `checksums.sha256` entry is written last in
standard GNU coreutils format. Receivers verify with
`sha256sum -c checksums.sha256`. The hash is computed by feeding each
chunk to both the SHA-256 digest and the ZIP stream — no double-read
even for multi-hundred-MB graph databases. Order is deterministic
(sorted dir walks + sorted git ls-files), so the resulting
`checksums.sha256` is byte-stable.
- **No public-internet calls in launcher scripts.** `serve.sh` and
`serve.bat` previously fell back to `curl -fL https://repo1.maven.org/...`
when the CLI JAR wasn't bundled — incompatible with the air-gapped
deploy model documented in `~/.claude/rules/build.md`. The Maven
Central download is removed; if the JAR is missing, the launcher
fails fast and tells the operator to either `--include-jar` when
bundling or stage from an internal artifact mirror. `serve.sh` also
runs `sha256sum -c --quiet checksums.sha256` automatically before
launching (skip with `CODEIQ_SKIP_VERIFY=1`).
- **Pinned Semgrep version.** `.github/workflows/security.yml` was
`pip install semgrep` (floating) — Scorecard's
`Pinned-Dependencies` flagged it. Now pinned to `semgrep==1.161.0`
(latest stable as of 2026-04-28). Bumps go through Dependabot's pip
ecosystem on a documented cadence.
- **Tightened secret-pattern exclusions.** `.gitignore` previously
only matched `.env` / `.env.local` — gaps for `.env.prod`,
`.env.test`, JKS / P12 keystores, SSH private keys, and
cloud-credential JSON. Broadened to `.env.*` plus explicit globs
for `*.jks`, `*.p12`, `*.pfx`, `*.keystore`, `id_{rsa,ecdsa,ed25519,dsa}`,
`credentials.{json,yaml}`, `secrets.{json,yaml}`,
`*.serviceaccount.json`. `.dockerignore` mirrors the same rules
(Docker resolves COPY against the build context, which includes
untracked working-tree files; .dockerignore does not inherit
.gitignore).
- **Bundle verification runbook.** `shared/runbooks/release.md` §4a
documents consumer-side `sha256sum -c` workflow, including the
deliberate exclusion of `checksums.sha256` from itself (would be
circular) and the Sigstore/GPG out-of-band signing that backs
`checksums.sha256` against tampering.
- **Tests:** `BundleCommandTest#bundleCreatesZipWithCorrectStructure`
extended with 4 new asserts: serve.sh contains no `curl`/`maven.org`
references (defense against re-introduction), `checksums.sha256`
exists, format-conforms to `<64-hex> <path>`, and excludes itself.
Full suite: 3672 tests / 0 failures / 0 errors.

## [0.1.0] - 2026-03-28

First general-availability cut. See the
Expand Down
5 changes: 5 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -447,6 +447,11 @@ bean for code paths that haven't been ported yet.
- **`Files.probeContentType` is best-effort** — JDK 25 on Linux uses `/etc/mime.types` + magic-byte fallback. It returns `null` if the type can't be determined; treat that as "let it through" (the byte cap in `SafeFileReader` still bounds size). The allowlist for `/api/file` is `text/*` + `application/{json,xml,x-yaml,javascript}` — extending requires adding to the explicit list in `GraphController.readFile`.
- **Sanitize user-controlled values before logging.** `BearerAuthFilter.sanitizeForLog(String)` strips `\p{Cntrl}` and truncates at 256 chars. Use it on anything tainted by `request.getRequestURI()`, `request.getMethod()`, headers, etc. before passing to a logger. CodeQL `java/log-injection` will flag direct `log.warn("... {} ...", request.getRequestURI())` as a vuln.
- **`mcp.limits.max_depth` is a NEW field on `McpLimitsConfig`** (default 10). Audit #10 / C3 — the original audit assumed it existed but it didn't. When adding new MCP traversal tools, cap depth via `Math.min(callerSupplied, maxDepth)` before passing to Cypher. The REST endpoint already had this guard via `config.getMaxDepth()` from `CodeIqConfig`; the MCP path now mirrors it via `McpLimitsConfig.maxDepth()`.
- **`codeiq bundle` writes `checksums.sha256` LAST and excludes itself.** `BundleCommand#writeChecksumsManifest` runs after every other entry has been written, then the digests collected in `LinkedHashMap<String,String> checksums` are emitted as `<sha256> <path>\n` per line — exactly GNU coreutils `sha256sum` format, so receivers verify with `sha256sum -c checksums.sha256`. The manifest itself is intentionally NOT in the digest list (would be circular); to verify `checksums.sha256` against tampering, sign the bundle.zip out-of-band (Sigstore, GPG, or compare to the GitHub Release SHA-256). Don't try to "fix" the circular omission by hashing checksums.sha256 into the manifest — that turns into a cat-and-mouse loop.
- **`writeFileHashed` reads each file once, feeding both the SHA-256 and the ZIP stream.** Hundreds-of-MB graph DBs / CLI JARs can't be double-read for a separate hash pass. The 8KB chunk size in `BundleCommand` is small enough to keep memory flat regardless of file size; do NOT collect bytes into a `byte[]` and then split for "convenience".
- **`serve.sh` and `serve.bat` MUST NOT contain network calls.** Audit RAN-46 §3 — air-gapped deploy model. Pre-PR-3 these scripts had `curl -fL https://repo1.maven.org/...` to download the CLI JAR on first run; that's gone. Receivers must `--include-jar` when bundling or stage the JAR from an internal mirror. There's a regression test in `BundleCommandTest#bundleCreatesZipWithCorrectStructure` that asserts `serve.sh` contains neither `curl` nor `maven.org` — keep that test green.
- **`.dockerignore` does NOT inherit `.gitignore`.** Docker resolves COPY against the build context, which includes uncommitted/untracked working-tree files. `.gitignore` only stops things being staged; it has no effect on what `docker build` sees. Mirror the secret-pattern globs explicitly in `.dockerignore` (`.env*`, `*.jks`, `id_rsa`, `credentials.{json,yaml}`, etc.). Pre-PR-3 the `.dockerignore` was 9 lines and would have shipped a `.env.prod` straight into a published image.
- **Semgrep is pinned to `semgrep==1.161.0`** in `.github/workflows/security.yml`. Bumps go through Dependabot's pip ecosystem on a documented cadence — `pip install --upgrade semgrep` (floating) was previously flagged by Scorecard `Pinned-Dependencies`. Don't unpin to "always get latest"; a CI-time auto-bump on a security-scanner can break the build silently when the new release adds rules.

## Supply-chain observability (OpenSSF)

Expand Down
36 changes: 36 additions & 0 deletions shared/runbooks/release.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,42 @@ Within 30 minutes of the release workflow finishing:

If any of (1)–(4) fails, [`rollback.md`](rollback.md) applies.

### 4a. Consumer-side bundle integrity (`codeiq bundle` artifacts)

When operators receive a `*-bundle.zip` produced by `codeiq bundle`, they
**must** verify integrity before launching the bundled `serve.sh` /
`serve.bat`. The bundle ships a `checksums.sha256` entry in standard GNU
coreutils format, generated as the last step of bundling
(`BundleCommand#writeChecksumsManifest`).

```bash
# 1. Unzip into a clean directory.
unzip myrepo-v1.0-bundle.zip -d myrepo-bundle/
cd myrepo-bundle

# 2. Verify every file. Exits non-zero if any entry is missing or modified;
# `checksums.sha256` itself is intentionally not listed (would be circular).
sha256sum -c --quiet checksums.sha256

# 3. (Optional) Skip via env var only when the bundle is trusted source-internal:
# CODEIQ_SKIP_VERIFY=1 ./serve.sh
./serve.sh
```

`serve.sh` runs the same `sha256sum -c` automatically when the binary is
on `PATH`. **Do not set `CODEIQ_SKIP_VERIFY=1` in production**: it
disables the only consumer-side integrity gate when the bundle was
delivered out-of-band (USB, internal mirror, AKS sidecar artifact). For
verifying `checksums.sha256` itself against tampering, sign the
bundle.zip out-of-band (Sigstore, GPG, or compare to the GitHub Release
SHA-256 if the bundle was published to a release).

If the consumer environment does not provide `sha256sum` (Windows without
WSL, locked-down build agents), distribute the bundle via Sigstore-signed
release and rely on the Sigstore client for integrity. `serve.bat`
intentionally does **not** include a Windows-native verification step
yet — tracked under follow-up.

---

## 5. Hot-fix patch release (`X.Y.Z+1`)
Expand Down
Loading
Loading