Skip to content

Cleanup area deadwood#517

Merged
donboyd5 merged 3 commits intomasterfrom
cleanup-area-deadwood
Apr 30, 2026
Merged

Cleanup area deadwood#517
donboyd5 merged 3 commits intomasterfrom
cleanup-area-deadwood

Conversation

@donboyd5
Copy link
Copy Markdown
Collaborator

@donboyd5 donboyd5 commented Apr 30, 2026

Two-commit cleanup of legacy area-pipeline artifacts. No behavior change — both removed pieces were unused by the active code path.

Commit 1 (853e25a) — Retires the tmd/areas/targets/prepare/ tree, a leftover from the old R/Quarto preparation pipeline. Migrates the three SOI Congressional District documentation files (21incddocguide.docx, 22incddocguide.docx, and the extracted .xlsx) to the live data tree at tmd/areas/prepare/data/soi_cds/, where they sit alongside 22incd.csv and serve as the CD-side analog to the SOI state-side variable documentation already in tmd/areas/prepare/data/soi_states/. Drops the rest of the directory (seven unused state CSVs for years 2015-2021, congressional2021.zip, the redirect-stub README, two placeholder .gitignores). Trims SOI_STATE_CSV_PATTERNS and SOI_CD_CSV_PATTERNS in tmd/areas/prepare/constants.py to the only year actually used (2022).

Commit 2 (91699c6) — Removes tmd/areas/make_all.py. The module was documented as the "end-to-end driver used in CI" but is not actually invoked by any Makefile, GitHub Actions workflow, or other script: the live driver is tmd/areas/Makefile → tmd.areas.solve_weights → tmd.areas.batch_weights. make_all.py also still read from the legacy top-level tmd/areas/targets/ directory rather than the production targets/states/ and targets/cds_{118,119}/ subdirectories, so running it would have targeted nothing useful. The one helper used elsewhere (time_of_newest_other_dependency, imported by batch_weights.py:369) is inlined into batch_weights.py as the private _time_of_newest_other_dependency, with its OTHER_DEPENDENCIES list folded into the function body. Drops make_all mentions from the tmd/areas/create_area_weights.py module docstring, tmd/areas/README.md directory listing, and tmd/areas/AREA_WEIGHTING_GUIDE.md directory listing.

Commit 3 (1f19b50) -- Removes the extracted xlsx unnecessarily included in Commit 1.

Verification. make format and make lint clean. pytest --collect-only collects 338 tests (unchanged). The moved helper was tested via python -c "from tmd.areas.batch_weights import _time_of_newest_other_dependency; print(_time_of_newest_other_dependency())" and returns the expected mtime.

donboyd5 and others added 3 commits April 30, 2026 05:21
The tmd/areas/targets/prepare/ tree was the old Quarto-based
documentation site for state and CD target preparation. Per the
README that lived there, the documentation has been retired in
favor of in-repo docs under tmd/areas/prepare/, AREA_WEIGHTING_GUIDE.md,
and the various README files.

Migrates the SOI Congressional District documentation files to the
live data tree, where they are co-located with the active CD source
data and serve as the CD-side analog to the SOI state-side variable
documentation already present in tmd/areas/prepare/data/soi_states/:

- 21incddocguide.docx  -> tmd/areas/prepare/data/soi_cds/
- 22incddocguide.docx  -> tmd/areas/prepare/data/soi_cds/
- cd_documentation_extracted_from_21incddocguide.docx.xlsx
                       -> tmd/areas/prepare/data/soi_cds/

Removes the rest of tmd/areas/targets/prepare/ outright. Each
removed item is unused by current code:

- The 7 SOI state CSVs for years 2015-2021 in
  prepare_states/data/data_raw/ matched the SOI_STATE_CSV_PATTERNS
  dict entries for those years, but the loader in
  tmd/areas/prepare/extended_targets.py reads from
  tmd/areas/prepare/data/soi_states/ instead. That directory only
  contains the 2022 file as CSV (years 2015-2021 are XLSX), so
  calling build_extended_targets with soi_year in {2015..2021}
  would have raised FileNotFoundError. All actual callers pass
  soi_year=2022.
- congressional2021.zip is the original SOI download that contained
  the CD doc guide (kept above) plus 21incd.csv and 50+ per-state
  XLSX files. The XLSX files are not read by any code path. The
  21incd.csv inside the zip is also unused: no caller of the area
  pipeline passes year=2021. If a 2021 baseline is ever needed,
  the file can be re-downloaded from SOI.
- The prepare/README.md is a redirect stub.
- The two .gitignore placeholders in prepare_cds/data/intermediate/
  and target_recipes/ have no remaining purpose.

Trims SOI_STATE_CSV_PATTERNS and SOI_CD_CSV_PATTERNS in
tmd/areas/prepare/constants.py to the only year actually used (2022),
removing the dead 2015-2021 (state) and 2021 (CD) entries that
pointed at files no longer present in the repository.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
make_all.py was documented as the "end-to-end driver used in CI"
but is not actually invoked by any Makefile, GitHub Actions
workflow, or other script. The current pipeline drives area-weight
generation through tmd/areas/Makefile, which calls
tmd.areas.solve_weights -> tmd.areas.batch_weights instead.

make_all.py also reads from the legacy top-level tmd/areas/targets/
directory, while production target files live in
tmd/areas/targets/states/ and tmd/areas/targets/cds_{118,119}/, so
running it today would target nothing useful.

The single helper function used elsewhere
(time_of_newest_other_dependency, imported by batch_weights.py)
has been moved into batch_weights.py as the private helper
_time_of_newest_other_dependency, with the module-level
OTHER_DEPENDENCIES list inlined inside it.

Also removes "make_all" mentions from:
- tmd/areas/create_area_weights.py module docstring
- tmd/areas/README.md directory listing
- tmd/areas/AREA_WEIGHTING_GUIDE.md directory listing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier commit 853e25a migrated three CD documentation files
into tmd/areas/prepare/data/soi_cds/. Two of them
(21incddocguide.docx, 22incddocguide.docx) are SOI's original
published variable-definition documentation and are irreplaceable
without re-downloading from IRS. The third,
cd_documentation_extracted_from_21incddocguide.docx.xlsx, is a
hand-extracted spreadsheet derivative of the 2021 docx — useful
for programmatic lookup of variable definitions, but reproducible
from the docx and not original SOI source material.

Removing the derivative keeps tmd/areas/prepare/data/soi_cds/
limited to original SOI sources (data + documentation). If a
structured/extracted copy is needed in the future, it can be
regenerated from 21incddocguide.docx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@donboyd5 donboyd5 merged commit 1d366a4 into master Apr 30, 2026
1 check passed
@donboyd5 donboyd5 deleted the cleanup-area-deadwood branch April 30, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant