Add CBO-vs-TMD revenue-level sanity test (issue #502)#515
Merged
Conversation
Collaborator
|
@donboyd5, When I download PR #515 and run "make data", the tests fail with this message: Did I make a mistake? Or is your PR #515 code not completely up-to-date with the main branch? |
Collaborator
Compares weighted PUF (filer) totals on the TMD file against CBO's February 2026 individual income tax microsimulation projections at four anchor years. Tested aggregates (PUF subset, calendar-year): - number of returns - AGI (`c00100`) - individual income tax liability (`iitax`) Anchor years and per-year relative tolerances: - 2022 (actual): 1% - 2026: 2% - 2031: 5% - 2036: 6% Comparator is the calendar-year, 1040-universe liability series in sheet 3 of CBO's Revenue file 51138-2026-02-Revenue.xlsx, which is built from the same SOI 2022 PUF (Pub. 1304) sample TMD uses. This is a much cleaner comparator than the FY MTS cash-receipts series the (skipped) `test_tax_revenue` uses, which is why the levels match TMD essentially exactly in 2022 (under 0.6% on each aggregate). Tolerances widen with the projection horizon to reflect compounding growfactor uncertainty. Test runs in ~16s. Posted for review on issue #502.
d049882 to
7a3b674
Compare
Collaborator
Author
|
Thanks, @martinholmer. Rebased onto current master (ab98dfb) to clear the out-of-date banner. No content changes — only the base advanced past PR #514. |
Collaborator
|
@donboyd5, Thanks for including the PR #514 changes in your PR #515. Now, when I download #515 and run "make data", all the tests pass. Open questions for review
I think the current approach is the better approach. No changes needed on this point.
Yes, I think these years are fine.
I think they are "acceptable".
Delete the old test. |
martinholmer
approved these changes
Apr 29, 2026
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a new test
tests/test_revenue_levels_cbo.pythat checks weighted PUF (filer) totals on the TMD file against CBO's February 2026 individual income tax microsimulation projections at four anchor years.Tested aggregates (calendar-year, PUF subset only): number of returns, AGI (
c00100), and individual income tax liability (iitax).Anchor years and per-year relative tolerances:
Test runs in ~16 s (one TaxCalc loop advancing through 2036).
Posted for review on issue Design discussion: future-year revenue sanity check — how to align tax-calculator
iitax/payrolltaxwith a CBO or Treasury comparator #502.Why a new comparator (sheet 3 of the CBO Revenue file)
The skipped
test_tax_revenue.pycompares against the CBO fiscal-year cash-receipts series, derived from the Monthly Treasury Statement. That series is structurally not comparable to TaxCalciitax: it includes Form 1041, Form 1042 / NRA withholding, refund-netting, late-assessment receipts, and FY-vs-CY timing. The result was a ~13–17% gap oniitaxeven in the 2022 base year — and no tight test was defensible.The CBO Revenue file
51138-2026-02-Revenue.xlsx, sheet "3.Individual Income Tax Details", publishes a calendar-year, 1040-universe liability series, including the line "Individual income tax liability" ( = "Income tax after credits" + "Net investment income tax"). That series is built from CBO's own individual income tax microsimulation model, which uses the same SOI 2022 PUF (Pub. 1304) sample TMD uses. So both sides are calendar-year, 1040-universe, liability — apples to apples.Empirically, the 2022 actual matches TMD essentially exactly:
That's a clean enough match to write a tight level-tolerance test, which addresses option 3 of the four-option menu in the issue.
Table 1 — full year-by-year levels (2022–2036)
Full CBO forecast, for context (only 2022, 2026, 2031, 2036 are actually anchored in the test):
Note that the AGI differences vs. CBO are driven primarily by the number of returns, not by AGI per return. Table 2 below suggests that TMD # of returns grows about 0.41% per year while CBO # grows about 0.81%. This seems like something we could resolve. Perhaps we have too little growth in number of returns? (Or perhaps CBO has too much? My guess, though, is that their growth is pretty consistent with available population forecasts.)
Table 2 - Compound annual growth, 2022→2036:
The level drift is essentially the cumulative growth-rate gap. From ~2027 onward TMD's annual growth runs ~0.25–0.35 pp slower than CBO's for both AGI and iitax. The widening returns-count gap is the largest single driver.
Tolerance choice
Tolerances widen with horizon to reflect compounding growfactor uncertainty. They were chosen so the worst-of-the-three aggregates passes at each year, with a small margin:
@martinholmer — happy to widen these further (or pick growth-rate instead of level) if you think better. Also, how do you feel about the approach here of widening tolerances as we go further in the future? Obviously uncertainty increases as the horizon lengthens, and differences accumulate over time. OTOH, I chose the tolerances judgmentally.
Open questions for review
tests/test_tax_revenue.py(the older skipped FY-MTS-based test) now be deleted, or kept skipped for historical reference?Test plan
python -m pytest tests/test_revenue_levels_cbo.py -xvspasses (16 s).make format && make lintclean.Related
iitax/payrolltaxwith a CBO or Treasury comparator #502