For developers: You joined a new team. The README says "it's pretty simple." There are 47 open TODOs, a file that's 2,800 lines long, and everyone goes quiet when you ask about the payments module. You could spend three days reading files β or run this and know in an hour what to avoid, who owns what, and what to touch first.
For non-technical users: Someone told you the codebase is "in good shape." Before you say that in a board meeting, sign off on a launch, or make a vendor decision β run this. It maps the system in plain language, surfaces the real risk areas, and generates questions you can actually ask in your next engineering meeting without sounding like you're guessing.
Stop guessing. Build the right mental model before you break something.
New repo. Inherited codebase. Your own code after six months away. The instinct is to start reading files. That's slow, incomplete, and leaves you blind to the things that will actually burn you β the undocumented env var, the file everyone avoids, the test suite that breaks when run in parallel.
This skill does the archaeology. Claude runs the investigation, maps the architecture, hunts for gotchas, generates a working local dev guide, and produces a living CODEBASE.md. Then it stays useful: check a file before touching it, catch problems before pushing, map a ticket to the codebase before writing a line.
| Mode | Use when |
|---|---|
| join | First day on a team, inherited repo, colleague's codebase |
| return | Your own code you haven't touched in 3+ months |
| audit | Evaluating an OSS project before contributing |
| quick | Need "what do I avoid" in 15 minutes β no time for full investigation |
| touch | About to modify a specific file β get a risk assessment first |
| preflight | About to push a PR β catch what reviewers will catch, before review |
| task | Assigned a ticket or feature β map it to the codebase before starting |
quick is a triage tool β Danger Zones and Gotchas only, no CODEBASE.md written.
touch, preflight, and task are ongoing β they require an existing CODEBASE.md.
Phase 0 Bootstrap README, CI config, open issues, package manifests
ββ AI detection Signals that the codebase is largely AI-generated β adjusts assessment lens
Phase 1 Critical Paths Entry points, data stores, Mermaid architecture diagram
Phase 2 Conventions What git history reveals vs. what the README claims
Phase 3 Danger Zones High-churn files, debt clusters, frequently reverted code
Phase 4 Gotcha Detector Security pre-check first, then: undocumented env vars, pre-commit/CI gaps, test traps
Phase 5 Local Dev Guide Step-by-step to get it running β real commands, common failures [technical]
Phase 6 Team Questions 1:1 format with priority tiers [technical]
Meeting Questions Sprint planning / roadmap / board framing [non-technical]
ββ Answers loop When answers arrive: which sections to update, how to close Open Questions
Phase 7 Executive Brief One-page health summary β framed for your stated goal [non-technical]
Phase 8 First Contribution Specific file + line + fix β not just a category [technical]
Phase 8b Ramp-up Timeline Week-by-week gates derived from findings β not a template [technical]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 9 Archaeology return only β why decisions were made, not just what they are
Phase 10 Contributor Signal audit only β merge rate, PR velocity, go/no-go
Before any phase runs, Claude asks two questions:
1. Technical or non-technical?
- Technical: file paths, code snippets, git commands, local dev guide, PR preflight
- Non-technical: plain language throughout, shareable architecture diagram, executive brief, questions framed for meetings β not for debugging sessions
2. What's your goal?
- Technical examples: make a contribution, take ownership, security review, evaluate OSS
- Non-technical examples: understand what the system does, assess risk before a launch, prepare for a roadmap or board conversation
The same investigation runs either way. The output is completely different.
See a complete example CODEBASE.md
# CODEBASE.md β payments-api
Generated: 2024-03-15 | Mode: join | Investigator: Claude
Last verified: 2024-03-15 | Staleness threshold: 4 weeks
---
## Project at a Glance β
Verified
**What it does:** Stripe payment processing API for the SaaS billing layer.
Handles subscriptions, webhooks, and invoice generation.
**Stack:** Go 1.21, PostgreSQL 15, Redis 7, deployed on Railway.
**Test runner:** `pytest -x` (CI), `make test` (README β inconsistent, see Q2)
**Commit convention:** Conventional commits (28 of last 30 PRs)
**AI-generated signal:** None detected. 1 author email, organic commit messages.
---
## Architecture β
Verified
```mermaid
graph LR
Client -->|HTTP| API[api/routes.go]
API --> Auth[auth/middleware.go]
Auth --> Stripe[stripe/client.go]
Auth --> Handler[handlers/]
Handler --> DB[(postgres)]
Handler --> Cache[(redis)]
Stripe -->|webhooks| Webhook[handlers/webhook.go]Entry points: cmd/server/main.go (API), cmd/worker/main.go (background jobs)
Data stores: PostgreSQL (primary), Redis (session cache + job queue)
| File / Area | Why dangerous | When to touch |
|---|---|---|
src/core/engine.go |
2,847 lines, 47 TODOs, in 89% of PRs | After 4+ weeks |
migrations/ |
Irreversible schema changes | Never solo |
auth/middleware.go |
No tests, last touched 18 months ago | With alice@ review |
payments/sync.go |
Reverted 3Γ in 6 months | Ask bob@ first |
STRIPE_WEBHOOK_SECRETrequired but absent from.env.exampleβ payments fail silently without it- Pre-commit runs
eslint --fix; CI runseslintβ passes locally, fails CI if you don't re-stage after the hook fires auth/tests share a singleton βpytest -n 4causes random failures; always runpytest -p no:xdist auth/scripts/seed.shrequired for tests β not in README
- Commits: Conventional commits enforced by pre-commit hook
- PR size: Median 280 lines (last 30 PRs); over 400 gets flagged in review
- Ownership: auth/ β alice@example.com, payments/ + API β bob@example.com
- Tests: Every source commit touches a test file (22 of last 25 PRs)
- Branches:
feat/,fix/,chore/prefixes, squash-merged
cp .env.example .env- Set missing variables:
STRIPE_WEBHOOK_SECRETβ ask alice@example.com for the dev keyJWT_SECRETβ any 32-char string works locally
npm installdocker-compose up -d postgres redisnpm run db:migratenode scripts/seed.shβ not in README; required for testsnpm run devβ http://localhost:3000
Verify: curl http://localhost:3000/health β {"status":"ok"}
STRIPE_WEBHOOK_SECRETis in code but not.env.example. Shared dev key?- CI runs
pytest -x, README saysmake test. Which for local dev?
payments/sync.goreverted 3Γ in 6 months β active fix, or avoided?auth/middleware.gohas no tests β intentional or technical debt?
core/engine.gois 2,847 lines β plan to split it, or intentional?
- No staging environment documented. Does one exist? How to access?
scripts/seed.shundocumented β what does it seed, and is it safe to re-run?
Target: tests/auth/test_middleware.py, line 47 β missing edge case for
expired tokens. Bob added a TODO comment 3 weeks ago. Low risk, high value.
Pattern to follow: tests/api/test_logging.py (same structure, merged cleanly).
β‘ Local dev running: curl http://localhost:3000/health β {"status":"ok"}
β‘ Blocking questions answered (Q1 + Q2 above)
β‘ Can explain Client β API β Auth β Handler β DB without this file
β‘ First PR merged without commit message feedback β‘ PR size within team norm (under 400 lines) β‘ Know who to ping for auth and payments changes
β‘ Can name all 4 Danger Zones without reading this file β‘ Touch mode no longer needed outside Danger Zones β‘ CODEBASE.md updated with anything that was wrong or missing
</details>
---
### `CODEBASE.md` β honest by design
Every section carries a confidence tag:
β
Verified Based on CI config, git history, or explicit documentation
Gap sections automatically become Team Questions. If something is tagged β, there's a corresponding question to ask.
**Example sections:**
```markdown
## Danger Zones β
Verified
| File / Area | Why dangerous | When to touch |
|---------------------|---------------------------------------|----------------|
| src/core/engine.go | 2,847 lines, 47 TODOs, in 89% of PRs | After 4+ weeks |
| migrations/ | Schema changes need team coordination | Never solo |
| auth/ | No tests, last touched 18 months ago | With review |
## Gotchas β
Verified
- `STRIPE_WEBHOOK_SECRET` required but absent from `.env.example` β
payments fail silently without it
- Pre-commit runs `eslint --fix`; CI runs `eslint` β passes locally,
fails CI if you don't re-stage after the hook fires
- `auth/` tests share a singleton β `pytest -n 4` causes random failures;
always run `pytest -p no:xdist auth/`
## Local Dev Guide β
Verified
1. `cp .env.example .env`
2. Set missing variables:
- `STRIPE_WEBHOOK_SECRET` β ask alice@example.com for the dev key
- `JWT_SECRET` β any 32-char string works locally
3. `npm install`
4. `docker-compose up -d postgres redis`
5. `npm run db:migrate`
6. `node scripts/seed.sh` β not in README; required for tests
7. `npm run dev` β http://localhost:3000
Verify: `curl http://localhost:3000/health` β `{"status":"ok"}`
For engineers:
graph LR
Client -->|HTTP| API[api/routes.go]
API --> Auth[auth/middleware.go]
Auth --> Handler[handlers/user.go]
Handler --> DB[(postgres)]
Handler --> Cache[(redis)]
For non-technical stakeholders β same investigation, plain language:
graph LR
User -->|sends request| API[Web API]
API --> Auth[Login Check]
Auth --> Logic[Business Logic]
Logic --> DB[(Database)]
Logic --> Cache[(Fast Cache)]
For technical users (1:1 format):
### π΄ Blocking (ask in the first hour)
1. `STRIPE_WEBHOOK_SECRET` is in code but not `.env.example`. Shared dev key?
2. CI runs `pytest -x`, README says `make test`. Which for local dev?
### π‘ Important (this week)
3. `payments/sync.go` reverted 3Γ in 6 months β active fix, or avoided?
### π’ Nice-to-know
4. `core/engine.go` is 2,400 lines β plan to split it, or intentional?For non-technical users (meeting format):
### For your next sprint planning
- The payment module has broken 3 times this year β what's the risk
if we ship features that touch it this sprint?
### For a board or investor conversation
- How would you describe the overall health of the engineering foundation?Generated after Phase 8. Every checkpoint references actual files, people, and question numbers found during the investigation β not generic milestones.
## Ramp-up Timeline β οΈ Inferred
### Week 1 β get oriented and unblocked
β‘ Local dev running: `curl http://localhost:3000/health` β {"status":"ok"}
β‘ STRIPE_WEBHOOK_SECRET and JWT_SECRET added to .env (ask alice@example.com)
β‘ Can explain Client β API β Auth β Handler β DB without CODEBASE.md
β‘ Blocking team questions answered (questions 1 and 2 β see Team Questions π΄)
β‘ First safe contribution submitted (target: test_auth.py line 47)
### Week 2 β know how the team works
β‘ First PR merged without commit message feedback (conventional commits format)
β‘ PR size within team norm (under 400 lines, based on git log)
β‘ Know who to ping: auth β alice@example.com, payments/API β bob@example.com
β‘ Important questions answered (questions 3β5 β see Team Questions π‘)
### Week 4 β own the codebase
β‘ Can name all 3 Danger Zones without looking at CODEBASE.md
β‘ Touch mode no longer needed outside Danger Zones
β‘ Ready to review a teammate's PR for convention compliance
β‘ CODEBASE.md updated with anything that was wrong or missingReturn mode adds a recovery gate at end of Week 1: archaeology complete, changes since your absence absorbed, prior mental model assumptions flagged as outdated.
## Executive Brief
### Codebase health
| Area | Status | Business impact |
|------|--------|----------------|
| Core engine | π΄ High risk | Changes here are slow and bug-prone |
| Payments | π‘ Unstable | Has broken 3Γ in 6 months |
| Auth | π‘ Untested | No safety net; bugs affect all users |
| API | π’ Healthy | Well-maintained, stable |
### Top risks
1. Payment processing has broken and been reverted three times β any change
here carries meaningful risk of customer-facing outage.
2. Authentication has no automated tests β bugs affect every user.
### Overall assessment
Medium risk. The API layer is healthy, but two critical areas (payments
and auth) need investment before safely shipping major new features."Quick mode β I need to make a change in the next hour."
No CODEBASE.md written. Runs Bootstrap + Danger Zones + Gotchas only. Output is a single briefing:
Quick Briefing: payments-api
β οΈ This is triage, not orientation. Run join mode when you have time.
DON'T TOUCH FIRST
migrations/ β irreversible schema changes, never solo
auth/ β no tests, 3 reverts in 6 months, get review first
.env files β shared config, changes affect everyone immediately
GOTCHAS TO KNOW NOW
STRIPE_WEBHOOK_SECRET missing from .env.example β payments fail silently
Pre-commit runs eslint --fix; CI runs eslint β re-stage after hook fires
auth/ tests share a singleton β run pytest -p no:xdist, not pytest -n 4
Suggested prompt for your change:
"In api/middleware.go, [your change]. Be minimal. Don't touch auth/."
"I'm about to modify
auth/middleware.goβ run touch mode."
Checks CODEBASE.md staleness first. If it's older than 4 weeks, warns before using its data.
Before You Touch: auth/middleware.go
Risk level: HIGH β listed in Danger Zones
Recent commits:
3 days ago fix: token expiry edge case alice@example.com
2 weeks ago REVERT: "refactor auth flow" β broke staging
Who to ping: alice@example.com (14 of last 20 commits)
Known issues:
Line 47 TODO refresh token rotation not implemented
Line 203 FIXME breaks with multiple active sessions
Tests covering this:
tests/auth/middleware_test.go
tests/integration/session_test.go
Run these now, before editing, to establish a baseline.
A failure before you start is not your bug. A failure after is.
Watch out for:
Session singleton on line 89 β caused the revert two weeks ago
Suggested prompt for your next message:
"In auth/middleware.go, [your change]. Be minimal.
Don't touch the session singleton on line 89."
"Run preflight on my current changes."
Checks CODEBASE.md staleness first. Every β includes the corrected version and exact command. Every
PR Pre-flight: feat/add-rate-limiting
Branch: feat/add-rate-limiting (3 source files, 0 test files changed)
ββββββββββββββββββββββββββββββββββββββββ
COMMIT MESSAGE
ββββββββββββββββββββββββββββββββββββββββ
β Doesn't follow conventional commits (used in 28 of last 30 commits)
Current: "add rate limiting"
Fix to: "feat(api): add rate limiting to middleware"
Command: git commit --amend -m "feat(api): add rate limiting to middleware"
ββββββββββββββββββββββββββββββββββββββββ
FILES CHANGED
ββββββββββββββββββββββββββββββββββββββββ
β
api/routes.go β not a Danger Zone
β
api/middleware.go β not a Danger Zone
β οΈ auth/middleware.go β DANGER ZONE
Why: No tests, last touched 18 months ago, security-sensitive
Action: alice@example.com must review (14 of last 20 commits here)
Watch for: session singleton on line 89 β caused a revert last month
ββββββββββββββββββββββββββββββββββββββββ
PR SIZE
ββββββββββββββββββββββββββββββββββββββββ
β οΈ 430 lines changed β team norm is under 400 (based on last 30 merged PRs)
Consider splitting: rate limiting logic vs. test files
ββββββββββββββββββββββββββββββββββββββββ
TEST COVERAGE
ββββββββββββββββββββββββββββββββββββββββ
β No test files in diff (convention: every commit that touches source touches tests)
Add tests to:
api/middleware.go β tests/api/middleware_test.go
auth/middleware.go β tests/auth/middleware_test.go
Pattern to follow: tests/api/logging_test.go
(added with "feat(api): add request logging" β same structure)
ββββββββββββββββββββββββββββββββββββββββ
GOTCHAS
ββββββββββββββββββββββββββββββββββββββββ
β οΈ Pre-commit runs eslint --fix; CI runs eslint without fix
After hook fires: git add api/middleware.go && git push
β οΈ auth/ tests share a singleton β don't run with -n
Use: pytest -p no:xdist tests/auth/
ββββββββββββββββββββββββββββββββββββββββ
VERDICT: β οΈ ADDRESS BEFORE PUSHING
ββββββββββββββββββββββββββββββββββββββββ
1. git commit --amend -m "feat(api): add rate limiting to middleware"
2. Add tests to tests/api/middleware_test.go and tests/auth/middleware_test.go
(follow tests/api/logging_test.go)
3. Re-stage after pre-commit hook fires, then push
"I need to add rate limiting to the API β where do I start?"
Task: Add rate limiting to the API
Relevant files:
api/routes.go β entry point; where rate limiting hooks in
api/middleware.go β existing pattern to follow β start here
Danger Zone proximity:
auth/middleware.go β οΈ adjacent β avoid unless necessary
Similar past work:
"feat(api): add request logging middleware" β bob@example.com, 3 months ago
Same pattern: middleware.go, not routes.go
Conventions:
Every new middleware needs an integration test in tests/api/
Who to loop in: bob@example.com (owns api/, built existing middleware)
Risk level: LOW β api/ is not a Danger Zone, pattern is established
mkdir -p ~/.claude/skills/codebase-onboarding
curl -o ~/.claude/skills/codebase-onboarding/SKILL.md \
https://raw.githubusercontent.com/googlarz/codebase-onboarding/main/SKILL.md/codebase-onboarding join # new team or repo
/codebase-onboarding return # your own code after months away
/codebase-onboarding audit # evaluating OSS before contributing
/codebase-onboarding quick # 15-minute triage β danger zones + gotchas only
/codebase-onboarding touch # before modifying a file
/codebase-onboarding preflight # before pushing a PR
/codebase-onboarding task # when starting any new piece of work
Using codebase-onboarding on your team? Open an issue β I'd love to add your logo here.
See CONTRIBUTING.md.