Conversation
Per-area quality_report now reports three additional metrics for each
area, in both the per-area detail table and a new CSV that always
contains every area:
- AvgAGI: weighted AGI per return, in $K
- Unusual%: mean across the area's targets of
|area_target - pop_share * national_total|
/ |pop_share * national_total|
where the national total is computed by applying the target's recipe
(varname, count, scope, agi range, fstatus) to the full TMD with s006
weighting. National totals are cached across areas. The XTOT
population row is excluded because pop_share * national_pop = area_pop
by construction.
- ESS: Kish effective sample size, (sum w)^2 / sum(w^2), computed on the
area weight vector.
The detail table previously showed all areas (states) or only the top
20 by violations / weight distortion (CDs, counties). It still does,
but the report now also writes a per-area CSV at
<weight_dir>/quality_report_per_area.csv with all areas and the new
metrics, and prints the path at the bottom of the report so the data
is available for further analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 25, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three per-area metrics to
tmd/areas/quality_report.pyand writes a per-area CSV for every area on every run.The PER-AREA DETAIL table gains three columns at the right:
sum(w * c00100) / sum(w)).<area>_targets.csv, the national total is computed by applying the target's recipe (varname, count, scope, agi range, fstatus) to the full TMD withs006weighting; we then take the mean of|area_target - pop_share * national| / |pop_share * national|across rows. The XTOT population row is excluded becausepop_share * national_pop = area_popby construction. National totals are cached across areas, so this is cheap. An area that looked exactly like the nation would score 0%; for the current Congress 118 CD weights, NY-13 ≈ 45% (closest to typical) and NY-12 ≈ 256% (farthest).(sum w)^2 / sum(w^2), computed on the area weight vector. Lower ESS means the optimizer had to push weights further from population-proportional.The detail table still shows all areas for state scope and the top-20-by-violation subset for CDs/counties. To support further analysis of all areas (not just the displayed subset), every run also writes a CSV at
<weight_dir>/quality_report_per_area.csvwith one row per area and the full set of columns including the new metrics. The CSV path is printed at the bottom of the report.For Congress 118 (n=436 solved CDs), the new metrics correlate as expected:
unusualnessvs.|avg_agi/median - 1|Pearson r ≈ 0.70;unusualnessvs.ESSr ≈ −0.57;|avg_agi/median - 1|vs.ESSr ≈ −0.58. More-unusual areas tend to have more atypical average AGI and lower effective sample size, which matches intuition.No change to weight-solving, target construction, or any other pipeline output — this PR only affects what
quality_report.pyreads, prints, and writes.Test plan
python -m tmd.areas.quality_report --scope statesruns cleanly, table shows the three new columns, andtmd/areas/weights/states/quality_report_per_area.csvis written with 51 area rows.python -m tmd.areas.quality_report --scope cds --congress 118runs cleanly, andtmd/areas/weights/cds_118/quality_report_per_area.csvis written with 436 area rows.python -m tmd.areas.quality_report --scope cds --congress 119runs cleanly, andtmd/areas/weights/cds_119/quality_report_per_area.csvis written with 436 area rows.python -m tmd.areas.quality_report --scope NY12,NY13 --congress 118shows AvgAGI, Unusual%, and ESS for both CDs.make format && make lintboth succeed.