The source for the pre-rendered static archive of pixieengine.com. A small, dependency-free Node
generator (no SSG/framework) that turns the recovered sprite/tune dataset into a complete static
public/ tree, deployed to S3 + CloudFront by the CDK stack in ../infra/. Design
rationale: ../docs/architecture.md; deploy: ../docs/deploy.md.
| File | What it is |
|---|---|
build-dataset.mjs |
Merges the data sources into the canonical build/dataset.ndjson (one row per sprite; each carries its owner id u). |
extract-attribution.sh |
One-off DB+CDN extract → build/users.ndjson, build/{sprite,tune}_owners.tsv (sprite/tune → creator; display_name + bio + verified avatar only, no PII). |
generate.mjs |
The generator: dataset.ndjson + tunes.ndjson (+ users.ndjson, owner maps, comments.ndjson) → public/ (sprite/tune/tag/gallery pages, /<user>/ profile pages, /load/, sitemaps, robots, 410). Honors removed.tsv. |
removed.tsv |
Takedown / removal list — the source of truth for content pulled from the live archive. generate.mjs enforces it: removed sprites/tunes get no page (→ S3 404 → CloudFront /410 "Gone") and drop out of all galleries, tag pages, profiles, comments, and the sitemap. See Removing content below. |
remove.mjs |
Helper to append a validated row to removed.tsv (node remove.mjs <sprite|tune|user> <id> "<reason>", or comment <spriteId>#<hash>) — avoids hand-editing the TSV. |
removed-list.mjs |
Shared read/append/dedupe factory for the type\tid\tdate\treason lists (removed.tsv, adult.tsv) — used by remove.mjs + the review tool, so the format can't drift. |
adult.tsv |
Adult / 18+ list — "tasteful NSFW" allowed but gated. generate.mjs keeps the page but adds noindex + a fail-closed self-attestation interstitial + 🔞 badge, and holds it out of galleries/tags/profiles/sitemap. (True logged-in/age enforcement is Phase 2; this list will drive it.) |
comment-key.mjs |
Shared stable comment address (sha256(by+body)), so a comment removal key matches between the generator and the tools. |
moderation/ |
Proactive-moderation toolkit (see Moderation review below): terms.tsv + scan-text.mjs (text scan), scan-replay.mjs (replay scan → uploads/empties; formats in ../docs/replay-format.md), png-decode.mjs (zero-dep PNG decoder) + check-empty.mjs (pixel-verify "empty" candidates), review.mjs (local review server) + review.html (grid UI, incl. re-review mode via REREVIEW=1), and reviewed.tsv (decision log / false-positive allowlist). Scans write build/*-candidates.tsv. |
screen.css |
The site's real compiled stylesheet, copied verbatim from public/assets/screen-*.css (matches the live look). Emitted as /screen.css. |
postmaster.js |
Postmessage RPC shim embedded in the replay (/load/) editor shell. Emitted as /postmaster.js. |
replay-test.html |
Standalone manual harness for the editor replay flow. |
The deployed edge function is
../infra/functions/rewrite.js(CloudFront Function), not here.
build/— the data workspace (~68 MB): DB extracts (db_sprites.ndjson,db_desc.tsv,tunes.ndjson,users.ndjson,sprite_owners.tsv,tune_owners.tsv) plus the builtdataset.ndjson.public/— the generated static tree (~2.1 GB, ~256k objects), synced to S3 (pixieengine-static); regenerates in ~25s.
These are intentionally excluded from git (huge, deterministic outputs / data dumps). The repo keeps the code that reproduces them, not the artifacts.
tmp/sprite-viewer/ids.txt (authoritative S3 id list — existence)
tmp/cf-recovery/sprite_slugs.tsv (post-2017 titles/slugs from CloudFront logs)
tmp/wayback-recovery/*.tsv (Wayback-recovered tags/descriptions/titles)
build/db_sprites.ndjson + build/db_desc.tsv (2017 DB metadata)
build/sprite_owners.tsv ─┐ (sprite → owner; from extract-attribution.sh)
│ │
▼ node build-dataset.mjs (joins owner id onto each sprite as `u`)
build/dataset.ndjson ──┐
build/tunes.ndjson ──┤
build/users.ndjson ──┤ (creator display_name + bio + avatar; profile pages)
build/tune_owners.tsv ──┤
build/comments.ndjson ──┴─▶ node generate.mjs ──▶ public/
The recovery inputs live in the gitignored tmp/ working tree (the cf-log and Wayback recovery
pipelines write there). build-dataset.mjs reads them at ../tmp/... by default; override with the
IDS_TXT, SLUGS_TSV, and WAYBACK_DIR env vars if they live elsewhere.
cd static-site
./extract-attribution.sh # one-off: DB+CDN -> build/users.ndjson + {sprite,tune}_owners.tsv (needs a 2017 DB restore)
node build-dataset.mjs # (re)build build/dataset.ndjson from the sources above
node generate.mjs # -> public/ (~27s, ~270k files incl. ~12.6k profile pages)Then deploy per ../docs/deploy.md: aws s3 sync public/ s3://pixieengine-static
- a CloudFront invalidation.
Takedowns are driven by removed.tsv (tracked) — the single source of truth for what's been
pulled. The archive is "publish all" by default (architecture.md decision
#2); removal is the reactive enforcement path. generate.mjs reads the list and, per row type:
| type | effect |
|---|---|
sprite / tune |
no page emitted → S3 404 → CloudFront serves /410 "Gone" (de-indexes cleanly); dropped from every gallery, tag page, the owner's profile, recovered comments, and the sitemap. |
user |
profile page removed, the user's sprites/tunes de-attributed to Anonymous (art stays unless its own id is also listed), and their handle scrubbed from recovered comments. |
comment |
one recovered comment (body + handle) dropped from its sprite's thread. Addressed by <spriteId>#<hash> — a content hash, stable across rebuilds. |
Remove one comment (e.g. a slur in a comment body — a user removal only scrubs the handle, not the body):
node remove.mjs comment 74464 # lists the sprite's comments + their keys
node remove.mjs comment 74464#ce9aa2eb07 "abuse — slur in comment body"
node generate.mjs && aws s3 sync public/ s3://pixieengine-static --delete && <invalidate /sprites/74464/*>Standard takedown (DMCA, abuse report, account/handle scrub):
cd static-site
node remove.mjs sprite 123456 "DMCA takedown — claimant Foo, 2026-06-02"
node generate.mjs
aws s3 sync public/ s3://pixieengine-static --delete # --delete is what makes the page 404
aws cloudfront create-invalidation --distribution-id E2QQUW2BPHXXNP --paths '/sprites/123456/*' '/sprites/*' '/tags/*' '/sitemap*'--delete is required — it's what removes the already-synced page object so it 404s. Re-running
the build without it leaves the old page in S3.
Urgent / illegal content (e.g. CSAM): the static rebuild does not touch the image objects,
which live on a separate CDN bucket (images.pixie.strd6.com, served via *.pixiecdn.com). Pull the
image first, then do the rebuild above so it doesn't reappear in listings:
aws s3 rm s3://images.pixie.strd6.com/sprites/123456/ --recursive # original.png, replay.json, thumb, …
aws cloudfront create-invalidation --distribution-id <images-dist-id> --paths '/sprites/123456/*'
node remove.mjs sprite 123456 "CSAM report — reported to NCMEC <date>" # then generate.mjs + sync --deletePitfall: a removal that's only in S3 (image deleted) but not in
removed.tsvwill reappear in galleries/sitemap on the next rebuild. Always record it inremoved.tsv.
Proactive review of the ~105k unreviewed sprites runs in two steps — a scan that proposes candidates, then an operator tool to triage them.
Scanners propose candidates (build/*-candidates.tsv); the review server triages them.
cd static-site
node moderation/scan-text.mjs # text scan → build/scan-candidates.tsv
node moderation/scan-replay.mjs # replay scan (uploads/empties) → build/replay-candidates.tsv
node moderation/check-empty.mjs # pixel-verify empties → build/empty-verified.tsv (+ empty-content.tsv)
node moderation/review.mjs [candidates.tsv] # local review server → http://localhost:8787scan-text.mjsflags titles/tags/descs/handles/comments matchingmoderation/terms.tsv(categorised, severity 1–3). Triage aid only — false positives expected; not a CSAM detector (that needs hash-matching, e.g. PhotoDNA/NCMEC). Re-runs skip anything already inreviewed.tsv.scan-replay.mjsfetches eachreplay.jsonand flags uploads/empties from the op structure — no image decode (formats:../docs/replay-format.md). An upload is a signal, not a verdict; only v0-emptyis treated as likely-trash. Resumable cache.check-empty.mjs(+ zero-deppng-decode.mjs) confirms emptiness against actual pixels — the replayemptysignal over-flags (~83% of v0-empties actually have content). Splits intoempty-verified.tsv(real blanks) andempty-content.tsv(false empties → merit review).review.mjsserves a 100-at-a-time grid (arrow-key paging, integer-scaled native pixel art that scales large uploads down to fit, comment/user views) with Valid / 🔞 Adult / False positive + multi-select & bulk (filter → "Select all shown" → bulk-decide). Valid →removed.tsv, adult →adult.tsv, every decision →reviewed.tsv(FP allowlist). Listsaws s3 rmcommands for valid sprite removals. Point it at any candidate file or id list:node moderation/review.mjs path. Re-review mode (REREVIEW=1 node moderation/review.mjs <list>) re-surfaces already-decided items with their prior decision shown; only the ones you flip change state (untouched stay as decided).
After a review session, deploy as in Removing content: node generate.mjs →
aws s3 sync public/ s3://pixieengine-static --delete (a default mtime sync, not --size-only —
--size-only skips same-byte content changes and desyncs pagination) → invalidate, plus the listed
aws s3 rm for valid sprite removals.