Add CBO-vs-TMD revenue-level sanity test (issue #502) by donboyd5 · Pull Request #515 · PSLmodels/tax-microdata-benchmarking

donboyd5 · 2026-04-29T17:26:37Z

Summary

Adds a new test tests/test_revenue_levels_cbo.py that checks weighted PUF (filer) totals on the TMD file against CBO's February 2026 individual income tax microsimulation projections at four anchor years.
Tested aggregates (calendar-year, PUF subset only): number of returns, AGI (c00100), and individual income tax liability (iitax).
Anchor years and per-year relative tolerances:

Year Tolerance Status

2022 (actual) 1% passes

2026 2% passes

2031 5% passes

2036 6% passes
Test runs in ~16 s (one TaxCalc loop advancing through 2036).
Posted for review on issue Design discussion: future-year revenue sanity check — how to align tax-calculator iitax / payrolltax with a CBO or Treasury comparator #502.

Why a new comparator (sheet 3 of the CBO Revenue file)

The skipped test_tax_revenue.py compares against the CBO fiscal-year cash-receipts series, derived from the Monthly Treasury Statement. That series is structurally not comparable to TaxCalc iitax: it includes Form 1041, Form 1042 / NRA withholding, refund-netting, late-assessment receipts, and FY-vs-CY timing. The result was a ~13–17% gap on iitax even in the 2022 base year — and no tight test was defensible.

The CBO Revenue file 51138-2026-02-Revenue.xlsx, sheet "3.Individual Income Tax Details", publishes a calendar-year, 1040-universe liability series, including the line "Individual income tax liability" ( = "Income tax after credits" + "Net investment income tax"). That series is built from CBO's own individual income tax microsimulation model, which uses the same SOI 2022 PUF (Pub. 1304) sample TMD uses. So both sides are calendar-year, 1040-universe, liability — apples to apples.

Empirically, the 2022 actual matches TMD essentially exactly:

AGI: TMD 14,852.2 B vs CBO 14,821.8 B → +0.21%
iitax: TMD 2,147.4 B vs CBO 2,149.6 B → −0.10%
Returns: TMD 162.13 M vs CBO 161.30 M → +0.51%

That's a clean enough match to write a tight level-tolerance test, which addresses option 3 of the four-option menu in the issue.

Table 1 — full year-by-year levels (2022–2036)

Full CBO forecast, for context (only 2022, 2026, 2031, 2036 are actually anchored in the test):

Year	AGI TMD-PUF	AGI CBO	Δ%	iitax TMD	iitax CBO liab.	Δ%	Returns TMD-PUF (M)	Returns CBO (M)	Δ%
2022	14,852.2	14,821.8	+0.21%	2,147.4	2,149.6	−0.10%	162.13	161.30	+0.51%
2023	15,539.9	15,347.4	+1.25%	2,222.5	2,171.9	+2.33%	163.79	164.70	−0.55%
2024	16,718.5	16,685.9	+0.20%	2,405.7	2,408.4	−0.11%	165.66	167.10	−0.86%
2025	17,843.3	17,837.9	+0.03%	2,498.6	2,493.7	+0.20%	166.96	168.00	−0.62%
2026	18,583.0	18,810.4	−1.21%	2,644.3	2,665.0	−0.78%	167.34	169.50	−1.27%
2027	19,095.4	19,506.7	−2.11%	2,704.7	2,776.8	−2.60%	167.77	170.80	−1.78%
2028	19,588.5	20,074.8	−2.42%	2,764.8	2,860.0	−3.33%	168.20	172.10	−2.27%
2029	20,153.5	20,802.9	−3.12%	2,904.8	3,039.1	−4.42%	168.66	173.20	−2.62%
2030	20,809.6	21,540.9	−3.39%	3,055.8	3,196.7	−4.41%	169.15	174.30	−2.95%
2031	21,532.9	22,272.4	−3.32%	3,178.1	3,310.2	−3.99%	169.66	175.80	−3.49%
2032	22,291.6	23,101.9	−3.51%	3,309.4	3,452.3	−4.14%	170.13	176.90	−3.83%
2033	23,075.7	23,961.7	−3.70%	3,446.1	3,599.4	−4.26%	170.56	177.90	−4.12%
2034	23,890.3	24,869.3	−3.94%	3,589.7	3,760.4	−4.54%	170.97	178.80	−4.38%
2035	24,731.6	25,817.9	−4.21%	3,739.0	3,929.3	−4.84%	171.35	179.70	−4.65%
2036	25,602.4	26,787.8	−4.43%	3,894.4	4,103.3	−5.09%	171.72	180.50	−4.86%

Note that the AGI differences vs. CBO are driven primarily by the number of returns, not by AGI per return. Table 2 below suggests that TMD # of returns grows about 0.41% per year while CBO # grows about 0.81%. This seems like something we could resolve. Perhaps we have too little growth in number of returns? (Or perhaps CBO has too much? My guess, though, is that their growth is pretty consistent with available population forecasts.)

Table 2 - Compound annual growth, 2022→2036:

Series	TMD-PUF	CBO	Δ pp/yr
AGI	3.97%	4.32%	−0.35
iitax	4.34%	4.73%	−0.39
Returns	0.41%	0.81%	−0.40

The level drift is essentially the cumulative growth-rate gap. From ~2027 onward TMD's annual growth runs ~0.25–0.35 pp slower than CBO's for both AGI and iitax. The widening returns-count gap is the largest single driver.

Tolerance choice

Tolerances widen with horizon to reflect compounding growfactor uncertainty. They were chosen so the worst-of-the-three aggregates passes at each year, with a small margin:

Year	Tolerance	Worst observed	Slack
2022	1.00%	0.51% (returns)	0.49 pp
2026	2.00%	1.27% (returns)	0.73 pp
2031	5.00%	3.99% (iitax)	1.01 pp
2036	6.00%	5.09% (iitax)	0.91 pp

@martinholmer — happy to widen these further (or pick growth-rate instead of level) if you think better. Also, how do you feel about the approach here of widening tolerances as we go further in the future? Obviously uncertainty increases as the horizon lengthens, and differences accumulate over time. OTOH, I chose the tolerances judgmentally.

Open questions for review

Is the level-tolerance approach the right way to go, or would you prefer a growth-rate test (option 1 in the issue)?
Are 2022 / 2026 / 2031 / 2036 the right anchor years, or would you prefer different ones?
Are the tolerances 1% / 2% / 5% / 6% acceptable, too tight, or too loose?
Should tests/test_tax_revenue.py (the older skipped FY-MTS-based test) now be deleted, or kept skipped for historical reference?

Test plan

python -m pytest tests/test_revenue_levels_cbo.py -xvs passes (16 s).
make format && make lint clean.
Reviewer to weigh in on tolerances and anchor years.

>           raise ValueError(emsg)
E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS USING 2022 DATA:
E           DIFF:OTM,totben,act,exp,atol= 23.68 23.65 0.01
E           DIFF:OTM,affben,act,exp,atol= 1413.0 1411 1.0
E           DIFF:ALL,totben,act,exp,atol= 58.92 58.89 0.01

tests/test_imputed_variables.py:173: ValueError
=============== short test summary info ======================================
FAILED tests/test_imputed_variables.py::test_obbba_deduction_tax_benefits - ValueError: 
=============== 1 failed, 59 passed, 2 skipped in 38.31s ==========================
make: *** [test] Error 1
(base) TMD> tc --version
Tax-Calculator 6.5.3 on Python 3.12
(base) TMD>

Did I make a mistake? Or is your PR #515 code not completely up-to-date with the main branch?

martinholmer · 2026-04-29T20:32:37Z

@donboyd5, Looks like your PR #515 is based on an old, out-of-date version of the main branch.

Compares weighted PUF (filer) totals on the TMD file against CBO's February 2026 individual income tax microsimulation projections at four anchor years. Tested aggregates (PUF subset, calendar-year): - number of returns - AGI (`c00100`) - individual income tax liability (`iitax`) Anchor years and per-year relative tolerances: - 2022 (actual): 1% - 2026: 2% - 2031: 5% - 2036: 6% Comparator is the calendar-year, 1040-universe liability series in sheet 3 of CBO's Revenue file 51138-2026-02-Revenue.xlsx, which is built from the same SOI 2022 PUF (Pub. 1304) sample TMD uses. This is a much cleaner comparator than the FY MTS cash-receipts series the (skipped) `test_tax_revenue` uses, which is why the levels match TMD essentially exactly in 2022 (under 0.6% on each aggregate). Tolerances widen with the projection horizon to reflect compounding growfactor uncertainty. Test runs in ~16s. Posted for review on issue #502.

donboyd5 · 2026-04-29T21:04:38Z

Thanks, @martinholmer.

Rebased onto current master (ab98dfb) to clear the out-of-date banner. No content changes — only the base advanced past PR #514.

martinholmer · 2026-04-29T21:43:17Z

@donboyd5, Thanks for including the PR #514 changes in your PR #515.

Now, when I download #515 and run "make data", all the tests pass.

Open questions for review

Is the level-tolerance approach the right way to go, or would you prefer a growth-rate test (option 1 in the issue)?

I think the current approach is the better approach. No changes needed on this point.

Are 2022 / 2026 / 2031 / 2036 the right anchor years, or would you prefer different ones?

Yes, I think these years are fine.

Are the tolerances 1% / 2% / 5% / 6% acceptable, too tight, or too loose?

I think they are "acceptable".

Should tests/test_tax_revenue.py (the older skipped FY-MTS-based test) now be deleted, or kept skipped for historical reference?

Delete the old test.

martinholmer · 2026-04-29T21:45:11Z

@donboyd5, You should merge PR #515.
We can discuss your findings about the CBO projection sometime next week.

donboyd5 requested a review from martinholmer April 29, 2026 17:26

donboyd5 force-pushed the issue-502-future-revenue-analysis branch from d049882 to 7a3b674 Compare April 29, 2026 20:56

martinholmer approved these changes Apr 29, 2026

View reviewed changes

donboyd5 merged commit d9f7704 into master Apr 29, 2026
1 check passed

donboyd5 deleted the issue-502-future-revenue-analysis branch April 29, 2026 21:45

This was referenced Apr 29, 2026

Design discussion: future-year revenue sanity check — how to align tax-calculator iitax / payrolltax with a CBO or Treasury comparator #502

Closed

Remove obsolete test_tax_revenue.py and its expected-data files #516

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CBO-vs-TMD revenue-level sanity test (issue #502)#515

Add CBO-vs-TMD revenue-level sanity test (issue #502)#515
donboyd5 merged 1 commit intomasterfrom
issue-502-future-revenue-analysis

donboyd5 commented Apr 29, 2026 •

edited

Loading

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

donboyd5 commented Apr 29, 2026

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

donboyd5 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a new comparator (sheet 3 of the CBO Revenue file)

Table 1 — full year-by-year levels (2022–2036)

Table 2 - Compound annual growth, 2022→2036:

Tolerance choice

Open questions for review

Test plan

Related

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

donboyd5 commented Apr 29, 2026

Uh oh!

martinholmer commented Apr 29, 2026

Open questions for review

Uh oh!

Uh oh!

martinholmer commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

donboyd5 commented Apr 29, 2026 •

edited

Loading