Skip to content

Add CBO-vs-TMD revenue-level sanity test (issue #502)#515

Merged
donboyd5 merged 1 commit intomasterfrom
issue-502-future-revenue-analysis
Apr 29, 2026
Merged

Add CBO-vs-TMD revenue-level sanity test (issue #502)#515
donboyd5 merged 1 commit intomasterfrom
issue-502-future-revenue-analysis

Conversation

@donboyd5
Copy link
Copy Markdown
Collaborator

@donboyd5 donboyd5 commented Apr 29, 2026

Summary

Why a new comparator (sheet 3 of the CBO Revenue file)

The skipped test_tax_revenue.py compares against the CBO fiscal-year cash-receipts series, derived from the Monthly Treasury Statement. That series is structurally not comparable to TaxCalc iitax: it includes Form 1041, Form 1042 / NRA withholding, refund-netting, late-assessment receipts, and FY-vs-CY timing. The result was a ~13–17% gap on iitax even in the 2022 base year — and no tight test was defensible.

The CBO Revenue file 51138-2026-02-Revenue.xlsx, sheet "3.Individual Income Tax Details", publishes a calendar-year, 1040-universe liability series, including the line "Individual income tax liability" ( = "Income tax after credits" + "Net investment income tax"). That series is built from CBO's own individual income tax microsimulation model, which uses the same SOI 2022 PUF (Pub. 1304) sample TMD uses. So both sides are calendar-year, 1040-universe, liability — apples to apples.

Empirically, the 2022 actual matches TMD essentially exactly:

  • AGI: TMD 14,852.2 B vs CBO 14,821.8 B → +0.21%
  • iitax: TMD 2,147.4 B vs CBO 2,149.6 B → −0.10%
  • Returns: TMD 162.13 M vs CBO 161.30 M → +0.51%

That's a clean enough match to write a tight level-tolerance test, which addresses option 3 of the four-option menu in the issue.

Table 1 — full year-by-year levels (2022–2036)

Full CBO forecast, for context (only 2022, 2026, 2031, 2036 are actually anchored in the test):

Year AGI TMD-PUF AGI CBO Δ% iitax TMD iitax CBO liab. Δ% Returns TMD-PUF (M) Returns CBO (M) Δ%
2022 14,852.2 14,821.8 +0.21% 2,147.4 2,149.6 −0.10% 162.13 161.30 +0.51%
2023 15,539.9 15,347.4 +1.25% 2,222.5 2,171.9 +2.33% 163.79 164.70 −0.55%
2024 16,718.5 16,685.9 +0.20% 2,405.7 2,408.4 −0.11% 165.66 167.10 −0.86%
2025 17,843.3 17,837.9 +0.03% 2,498.6 2,493.7 +0.20% 166.96 168.00 −0.62%
2026 18,583.0 18,810.4 −1.21% 2,644.3 2,665.0 −0.78% 167.34 169.50 −1.27%
2027 19,095.4 19,506.7 −2.11% 2,704.7 2,776.8 −2.60% 167.77 170.80 −1.78%
2028 19,588.5 20,074.8 −2.42% 2,764.8 2,860.0 −3.33% 168.20 172.10 −2.27%
2029 20,153.5 20,802.9 −3.12% 2,904.8 3,039.1 −4.42% 168.66 173.20 −2.62%
2030 20,809.6 21,540.9 −3.39% 3,055.8 3,196.7 −4.41% 169.15 174.30 −2.95%
2031 21,532.9 22,272.4 −3.32% 3,178.1 3,310.2 −3.99% 169.66 175.80 −3.49%
2032 22,291.6 23,101.9 −3.51% 3,309.4 3,452.3 −4.14% 170.13 176.90 −3.83%
2033 23,075.7 23,961.7 −3.70% 3,446.1 3,599.4 −4.26% 170.56 177.90 −4.12%
2034 23,890.3 24,869.3 −3.94% 3,589.7 3,760.4 −4.54% 170.97 178.80 −4.38%
2035 24,731.6 25,817.9 −4.21% 3,739.0 3,929.3 −4.84% 171.35 179.70 −4.65%
2036 25,602.4 26,787.8 −4.43% 3,894.4 4,103.3 −5.09% 171.72 180.50 −4.86%

Note that the AGI differences vs. CBO are driven primarily by the number of returns, not by AGI per return. Table 2 below suggests that TMD # of returns grows about 0.41% per year while CBO # grows about 0.81%. This seems like something we could resolve. Perhaps we have too little growth in number of returns? (Or perhaps CBO has too much? My guess, though, is that their growth is pretty consistent with available population forecasts.)

Table 2 - Compound annual growth, 2022→2036:

Series TMD-PUF CBO Δ pp/yr
AGI 3.97% 4.32% −0.35
iitax 4.34% 4.73% −0.39
Returns 0.41% 0.81% −0.40

The level drift is essentially the cumulative growth-rate gap. From ~2027 onward TMD's annual growth runs ~0.25–0.35 pp slower than CBO's for both AGI and iitax. The widening returns-count gap is the largest single driver.

Tolerance choice

Tolerances widen with horizon to reflect compounding growfactor uncertainty. They were chosen so the worst-of-the-three aggregates passes at each year, with a small margin:

Year Tolerance Worst observed Slack
2022 1.00% 0.51% (returns) 0.49 pp
2026 2.00% 1.27% (returns) 0.73 pp
2031 5.00% 3.99% (iitax) 1.01 pp
2036 6.00% 5.09% (iitax) 0.91 pp

@martinholmer — happy to widen these further (or pick growth-rate instead of level) if you think better. Also, how do you feel about the approach here of widening tolerances as we go further in the future? Obviously uncertainty increases as the horizon lengthens, and differences accumulate over time. OTOH, I chose the tolerances judgmentally.

Open questions for review

  1. Is the level-tolerance approach the right way to go, or would you prefer a growth-rate test (option 1 in the issue)?
  2. Are 2022 / 2026 / 2031 / 2036 the right anchor years, or would you prefer different ones?
  3. Are the tolerances 1% / 2% / 5% / 6% acceptable, too tight, or too loose?
  4. Should tests/test_tax_revenue.py (the older skipped FY-MTS-based test) now be deleted, or kept skipped for historical reference?

Test plan

  • python -m pytest tests/test_revenue_levels_cbo.py -xvs passes (16 s).
  • make format && make lint clean.
  • Reviewer to weigh in on tolerances and anchor years.

Related

@donboyd5 donboyd5 requested a review from martinholmer April 29, 2026 17:26
@martinholmer
Copy link
Copy Markdown
Collaborator

@donboyd5, When I download PR #515 and run "make data", the tests fail with this message:

>           raise ValueError(emsg)
E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS USING 2022 DATA:
E           DIFF:OTM,totben,act,exp,atol= 23.68 23.65 0.01
E           DIFF:OTM,affben,act,exp,atol= 1413.0 1411 1.0
E           DIFF:ALL,totben,act,exp,atol= 58.92 58.89 0.01

tests/test_imputed_variables.py:173: ValueError
=============== short test summary info ======================================
FAILED tests/test_imputed_variables.py::test_obbba_deduction_tax_benefits - ValueError: 
=============== 1 failed, 59 passed, 2 skipped in 38.31s ==========================
make: *** [test] Error 1
(base) TMD> tc --version
Tax-Calculator 6.5.3 on Python 3.12
(base) TMD> 

Did I make a mistake? Or is your PR #515 code not completely up-to-date with the main branch?

@martinholmer
Copy link
Copy Markdown
Collaborator

@donboyd5, Looks like your PR #515 is based on an old, out-of-date version of the main branch.

Screenshot 2026-04-29 at 4 29 07 PM

Compares weighted PUF (filer) totals on the TMD file against CBO's
February 2026 individual income tax microsimulation projections at
four anchor years.

Tested aggregates (PUF subset, calendar-year):
  - number of returns
  - AGI (`c00100`)
  - individual income tax liability (`iitax`)

Anchor years and per-year relative tolerances:
  - 2022 (actual): 1%
  - 2026:          2%
  - 2031:          5%
  - 2036:          6%

Comparator is the calendar-year, 1040-universe liability series in
sheet 3 of CBO's Revenue file 51138-2026-02-Revenue.xlsx, which is
built from the same SOI 2022 PUF (Pub. 1304) sample TMD uses. This
is a much cleaner comparator than the FY MTS cash-receipts series
the (skipped) `test_tax_revenue` uses, which is why the levels match
TMD essentially exactly in 2022 (under 0.6% on each aggregate).

Tolerances widen with the projection horizon to reflect compounding
growfactor uncertainty. Test runs in ~16s.

Posted for review on issue #502.
@donboyd5 donboyd5 force-pushed the issue-502-future-revenue-analysis branch from d049882 to 7a3b674 Compare April 29, 2026 20:56
@donboyd5
Copy link
Copy Markdown
Collaborator Author

Thanks, @martinholmer.

Rebased onto current master (ab98dfb) to clear the out-of-date banner. No content changes — only the base advanced past PR #514.

@martinholmer
Copy link
Copy Markdown
Collaborator

@donboyd5, Thanks for including the PR #514 changes in your PR #515.

Now, when I download #515 and run "make data", all the tests pass.

Open questions for review

  1. Is the level-tolerance approach the right way to go, or would you prefer a growth-rate test (option 1 in the issue)?

I think the current approach is the better approach. No changes needed on this point.

  1. Are 2022 / 2026 / 2031 / 2036 the right anchor years, or would you prefer different ones?

Yes, I think these years are fine.

  1. Are the tolerances 1% / 2% / 5% / 6% acceptable, too tight, or too loose?

I think they are "acceptable".

  1. Should tests/test_tax_revenue.py (the older skipped FY-MTS-based test) now be deleted, or kept skipped for historical reference?

Delete the old test.

@donboyd5 donboyd5 merged commit d9f7704 into master Apr 29, 2026
1 check passed
@donboyd5 donboyd5 deleted the issue-502-future-revenue-analysis branch April 29, 2026 21:45
@martinholmer
Copy link
Copy Markdown
Collaborator

@donboyd5, You should merge PR #515.
We can discuss your findings about the CBO projection sometime next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants