Conversation
The tmd/areas/targets/prepare/ tree was the old Quarto-based
documentation site for state and CD target preparation. Per the
README that lived there, the documentation has been retired in
favor of in-repo docs under tmd/areas/prepare/, AREA_WEIGHTING_GUIDE.md,
and the various README files.
Migrates the SOI Congressional District documentation files to the
live data tree, where they are co-located with the active CD source
data and serve as the CD-side analog to the SOI state-side variable
documentation already present in tmd/areas/prepare/data/soi_states/:
- 21incddocguide.docx -> tmd/areas/prepare/data/soi_cds/
- 22incddocguide.docx -> tmd/areas/prepare/data/soi_cds/
- cd_documentation_extracted_from_21incddocguide.docx.xlsx
-> tmd/areas/prepare/data/soi_cds/
Removes the rest of tmd/areas/targets/prepare/ outright. Each
removed item is unused by current code:
- The 7 SOI state CSVs for years 2015-2021 in
prepare_states/data/data_raw/ matched the SOI_STATE_CSV_PATTERNS
dict entries for those years, but the loader in
tmd/areas/prepare/extended_targets.py reads from
tmd/areas/prepare/data/soi_states/ instead. That directory only
contains the 2022 file as CSV (years 2015-2021 are XLSX), so
calling build_extended_targets with soi_year in {2015..2021}
would have raised FileNotFoundError. All actual callers pass
soi_year=2022.
- congressional2021.zip is the original SOI download that contained
the CD doc guide (kept above) plus 21incd.csv and 50+ per-state
XLSX files. The XLSX files are not read by any code path. The
21incd.csv inside the zip is also unused: no caller of the area
pipeline passes year=2021. If a 2021 baseline is ever needed,
the file can be re-downloaded from SOI.
- The prepare/README.md is a redirect stub.
- The two .gitignore placeholders in prepare_cds/data/intermediate/
and target_recipes/ have no remaining purpose.
Trims SOI_STATE_CSV_PATTERNS and SOI_CD_CSV_PATTERNS in
tmd/areas/prepare/constants.py to the only year actually used (2022),
removing the dead 2015-2021 (state) and 2021 (CD) entries that
pointed at files no longer present in the repository.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
make_all.py was documented as the "end-to-end driver used in CI"
but is not actually invoked by any Makefile, GitHub Actions
workflow, or other script. The current pipeline drives area-weight
generation through tmd/areas/Makefile, which calls
tmd.areas.solve_weights -> tmd.areas.batch_weights instead.
make_all.py also reads from the legacy top-level tmd/areas/targets/
directory, while production target files live in
tmd/areas/targets/states/ and tmd/areas/targets/cds_{118,119}/, so
running it today would target nothing useful.
The single helper function used elsewhere
(time_of_newest_other_dependency, imported by batch_weights.py)
has been moved into batch_weights.py as the private helper
_time_of_newest_other_dependency, with the module-level
OTHER_DEPENDENCIES list inlined inside it.
Also removes "make_all" mentions from:
- tmd/areas/create_area_weights.py module docstring
- tmd/areas/README.md directory listing
- tmd/areas/AREA_WEIGHTING_GUIDE.md directory listing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier commit 853e25a migrated three CD documentation files into tmd/areas/prepare/data/soi_cds/. Two of them (21incddocguide.docx, 22incddocguide.docx) are SOI's original published variable-definition documentation and are irreplaceable without re-downloading from IRS. The third, cd_documentation_extracted_from_21incddocguide.docx.xlsx, is a hand-extracted spreadsheet derivative of the 2021 docx — useful for programmatic lookup of variable definitions, but reproducible from the docx and not original SOI source material. Removing the derivative keeps tmd/areas/prepare/data/soi_cds/ limited to original SOI sources (data + documentation). If a structured/extracted copy is needed in the future, it can be regenerated from 21incddocguide.docx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two-commit cleanup of legacy area-pipeline artifacts. No behavior change — both removed pieces were unused by the active code path.
Commit 1 (853e25a) — Retires the tmd/areas/targets/prepare/ tree, a leftover from the old R/Quarto preparation pipeline. Migrates the three SOI Congressional District documentation files (21incddocguide.docx, 22incddocguide.docx, and the extracted .xlsx) to the live data tree at tmd/areas/prepare/data/soi_cds/, where they sit alongside 22incd.csv and serve as the CD-side analog to the SOI state-side variable documentation already in tmd/areas/prepare/data/soi_states/. Drops the rest of the directory (seven unused state CSVs for years 2015-2021, congressional2021.zip, the redirect-stub README, two placeholder .gitignores). Trims SOI_STATE_CSV_PATTERNS and SOI_CD_CSV_PATTERNS in tmd/areas/prepare/constants.py to the only year actually used (2022).
Commit 2 (91699c6) — Removes tmd/areas/make_all.py. The module was documented as the "end-to-end driver used in CI" but is not actually invoked by any Makefile, GitHub Actions workflow, or other script: the live driver is tmd/areas/Makefile → tmd.areas.solve_weights → tmd.areas.batch_weights. make_all.py also still read from the legacy top-level tmd/areas/targets/ directory rather than the production targets/states/ and targets/cds_{118,119}/ subdirectories, so running it would have targeted nothing useful. The one helper used elsewhere (time_of_newest_other_dependency, imported by batch_weights.py:369) is inlined into batch_weights.py as the private _time_of_newest_other_dependency, with its OTHER_DEPENDENCIES list folded into the function body. Drops make_all mentions from the tmd/areas/create_area_weights.py module docstring, tmd/areas/README.md directory listing, and tmd/areas/AREA_WEIGHTING_GUIDE.md directory listing.
Commit 3 (1f19b50) -- Removes the extracted xlsx unnecessarily included in Commit 1.
Verification. make format and make lint clean. pytest --collect-only collects 338 tests (unchanged). The moved helper was tested via python -c "from tmd.areas.batch_weights import _time_of_newest_other_dependency; print(_time_of_newest_other_dependency())" and returns the expected mtime.