Skip to content

Latest commit

 

History

History

README.md

static-site/ — pixieengine.com static archive generator

The source for the pre-rendered static archive of pixieengine.com. A small, dependency-free Node generator (no SSG/framework) that turns the recovered sprite/tune dataset into a complete static public/ tree, deployed to S3 + CloudFront by the CDK stack in ../infra/. Design rationale: ../docs/architecture.md; deploy: ../docs/deploy.md.

What's tracked here (source)

File What it is
build-dataset.mjs Merges the data sources into the canonical build/dataset.ndjson (one row per sprite; each carries its owner id u).
extract-attribution.sh One-off DB+CDN extract → build/users.ndjson, build/{sprite,tune}_owners.tsv (sprite/tune → creator; display_name + bio + verified avatar only, no PII).
generate.mjs The generator: dataset.ndjson + tunes.ndjson (+ users.ndjson, owner maps, comments.ndjson) → public/ (sprite/tune/tag/gallery pages, /<user>/ profile pages, /load/, sitemaps, robots, 410). Honors removed.tsv.
removed.tsv Takedown / removal list — the source of truth for content pulled from the live archive. generate.mjs enforces it: removed sprites/tunes get no page (→ S3 404 → CloudFront /410 "Gone") and drop out of all galleries, tag pages, profiles, comments, and the sitemap. See Removing content below.
remove.mjs Helper to append a validated row to removed.tsv (node remove.mjs <sprite|tune|user> <id> "<reason>", or comment <spriteId>#<hash>) — avoids hand-editing the TSV.
removed-list.mjs Shared read/append/dedupe factory for the type\tid\tdate\treason lists (removed.tsv, adult.tsv) — used by remove.mjs + the review tool, so the format can't drift.
adult.tsv Adult / 18+ list — "tasteful NSFW" allowed but gated. generate.mjs keeps the page but adds noindex + a fail-closed self-attestation interstitial + 🔞 badge, and holds it out of galleries/tags/profiles/sitemap. (True logged-in/age enforcement is Phase 2; this list will drive it.)
comment-key.mjs Shared stable comment address (sha256(by+body)), so a comment removal key matches between the generator and the tools.
moderation/ Proactive-moderation toolkit (see Moderation review below): terms.tsv + scan-text.mjs (text scan), scan-replay.mjs (replay scan → uploads/empties; formats in ../docs/replay-format.md), png-decode.mjs (zero-dep PNG decoder) + check-empty.mjs (pixel-verify "empty" candidates), review.mjs (local review server) + review.html (grid UI, incl. re-review mode via REREVIEW=1), and reviewed.tsv (decision log / false-positive allowlist). Scans write build/*-candidates.tsv.
screen.css The site's real compiled stylesheet, copied verbatim from public/assets/screen-*.css (matches the live look). Emitted as /screen.css.
postmaster.js Postmessage RPC shim embedded in the replay (/load/) editor shell. Emitted as /postmaster.js.
replay-test.html Standalone manual harness for the editor replay flow.

The deployed edge function is ../infra/functions/rewrite.js (CloudFront Function), not here.

What's NOT tracked (gitignored — regenerable / large)

  • build/ — the data workspace (~68 MB): DB extracts (db_sprites.ndjson, db_desc.tsv, tunes.ndjson, users.ndjson, sprite_owners.tsv, tune_owners.tsv) plus the built dataset.ndjson.
  • public/ — the generated static tree (~2.1 GB, ~256k objects), synced to S3 (pixieengine-static); regenerates in ~25s.

These are intentionally excluded from git (huge, deterministic outputs / data dumps). The repo keeps the code that reproduces them, not the artifacts.

Pipeline

                 tmp/sprite-viewer/ids.txt      (authoritative S3 id list — existence)
                 tmp/cf-recovery/sprite_slugs.tsv  (post-2017 titles/slugs from CloudFront logs)
                 tmp/wayback-recovery/*.tsv        (Wayback-recovered tags/descriptions/titles)
build/db_sprites.ndjson + build/db_desc.tsv  (2017 DB metadata)
build/sprite_owners.tsv ─┐ (sprite → owner; from extract-attribution.sh)
        │                │
        ▼  node build-dataset.mjs (joins owner id onto each sprite as `u`)
build/dataset.ndjson  ──┐
build/tunes.ndjson    ──┤
build/users.ndjson    ──┤ (creator display_name + bio + avatar; profile pages)
build/tune_owners.tsv ──┤
build/comments.ndjson ──┴─▶  node generate.mjs  ──▶  public/

The recovery inputs live in the gitignored tmp/ working tree (the cf-log and Wayback recovery pipelines write there). build-dataset.mjs reads them at ../tmp/... by default; override with the IDS_TXT, SLUGS_TSV, and WAYBACK_DIR env vars if they live elsewhere.

Build

cd static-site
./extract-attribution.sh  # one-off: DB+CDN -> build/users.ndjson + {sprite,tune}_owners.tsv (needs a 2017 DB restore)
node build-dataset.mjs    # (re)build build/dataset.ndjson from the sources above
node generate.mjs         # -> public/  (~27s, ~270k files incl. ~12.6k profile pages)

Then deploy per ../docs/deploy.md: aws s3 sync public/ s3://pixieengine-static

  • a CloudFront invalidation.

Removing content

Takedowns are driven by removed.tsv (tracked) — the single source of truth for what's been pulled. The archive is "publish all" by default (architecture.md decision #2); removal is the reactive enforcement path. generate.mjs reads the list and, per row type:

type effect
sprite / tune no page emitted → S3 404 → CloudFront serves /410 "Gone" (de-indexes cleanly); dropped from every gallery, tag page, the owner's profile, recovered comments, and the sitemap.
user profile page removed, the user's sprites/tunes de-attributed to Anonymous (art stays unless its own id is also listed), and their handle scrubbed from recovered comments.
comment one recovered comment (body + handle) dropped from its sprite's thread. Addressed by <spriteId>#<hash> — a content hash, stable across rebuilds.

Remove one comment (e.g. a slur in a comment body — a user removal only scrubs the handle, not the body):

node remove.mjs comment 74464                         # lists the sprite's comments + their keys
node remove.mjs comment 74464#ce9aa2eb07 "abuse — slur in comment body"
node generate.mjs && aws s3 sync public/ s3://pixieengine-static --delete && <invalidate /sprites/74464/*>

Standard takedown (DMCA, abuse report, account/handle scrub):

cd static-site
node remove.mjs sprite 123456 "DMCA takedown — claimant Foo, 2026-06-02"
node generate.mjs
aws s3 sync public/ s3://pixieengine-static --delete   # --delete is what makes the page 404
aws cloudfront create-invalidation --distribution-id E2QQUW2BPHXXNP --paths '/sprites/123456/*' '/sprites/*' '/tags/*' '/sitemap*'

--delete is required — it's what removes the already-synced page object so it 404s. Re-running the build without it leaves the old page in S3.

Urgent / illegal content (e.g. CSAM): the static rebuild does not touch the image objects, which live on a separate CDN bucket (images.pixie.strd6.com, served via *.pixiecdn.com). Pull the image first, then do the rebuild above so it doesn't reappear in listings:

aws s3 rm s3://images.pixie.strd6.com/sprites/123456/ --recursive   # original.png, replay.json, thumb, …
aws cloudfront create-invalidation --distribution-id <images-dist-id> --paths '/sprites/123456/*'
node remove.mjs sprite 123456 "CSAM report — reported to NCMEC <date>"   # then generate.mjs + sync --delete

Pitfall: a removal that's only in S3 (image deleted) but not in removed.tsv will reappear in galleries/sitemap on the next rebuild. Always record it in removed.tsv.

Moderation review

Proactive review of the ~105k unreviewed sprites runs in two steps — a scan that proposes candidates, then an operator tool to triage them.

Scanners propose candidates (build/*-candidates.tsv); the review server triages them.

cd static-site
node moderation/scan-text.mjs                       # text scan → build/scan-candidates.tsv
node moderation/scan-replay.mjs                     # replay scan (uploads/empties) → build/replay-candidates.tsv
node moderation/check-empty.mjs                     # pixel-verify empties → build/empty-verified.tsv (+ empty-content.tsv)
node moderation/review.mjs [candidates.tsv]         # local review server → http://localhost:8787
  • scan-text.mjs flags titles/tags/descs/handles/comments matching moderation/terms.tsv (categorised, severity 1–3). Triage aid only — false positives expected; not a CSAM detector (that needs hash-matching, e.g. PhotoDNA/NCMEC). Re-runs skip anything already in reviewed.tsv.
  • scan-replay.mjs fetches each replay.json and flags uploads/empties from the op structure — no image decode (formats: ../docs/replay-format.md). An upload is a signal, not a verdict; only v0-empty is treated as likely-trash. Resumable cache.
  • check-empty.mjs (+ zero-dep png-decode.mjs) confirms emptiness against actual pixels — the replay empty signal over-flags (~83% of v0-empties actually have content). Splits into empty-verified.tsv (real blanks) and empty-content.tsv (false empties → merit review).
  • review.mjs serves a 100-at-a-time grid (arrow-key paging, integer-scaled native pixel art that scales large uploads down to fit, comment/user views) with Valid / 🔞 Adult / False positive + multi-select & bulk (filter → "Select all shown" → bulk-decide). Valid → removed.tsv, adult → adult.tsv, every decision → reviewed.tsv (FP allowlist). Lists aws s3 rm commands for valid sprite removals. Point it at any candidate file or id list: node moderation/review.mjs path. Re-review mode (REREVIEW=1 node moderation/review.mjs <list>) re-surfaces already-decided items with their prior decision shown; only the ones you flip change state (untouched stay as decided).

After a review session, deploy as in Removing content: node generate.mjsaws s3 sync public/ s3://pixieengine-static --delete (a default mtime sync, not --size-only--size-only skips same-byte content changes and desyncs pagination) → invalidate, plus the listed aws s3 rm for valid sprite removals.