Add unweighted national-file fingerprint test (closes #503)#504
Merged
martinholmer merged 1 commit intomasterfrom Apr 24, 2026
Merged
Add unweighted national-file fingerprint test (closes #503)#504martinholmer merged 1 commit intomasterfrom
martinholmer merged 1 commit intomasterfrom
Conversation
Adds tests/test_tmd_file_fingerprint.py, which computes per-column summary statistics on tmd/storage/output/tmd.csv.gz and compares them against a committed reference JSON. Parallels the area-weights fingerprint at tests/test_fingerprint.py in structure, update workflow, and tolerance choice. For each column the test records six statistics: count (integer, exact match) and the floating-point sum, weighted_sum (= Σ column × s006), std, min, max (compared with relative tolerance 1e-3). The weighted_sum statistic locks each column's relationship with the record weight s006, so a regeneration that preserved each column's own distribution but shuffled which records received which weights cannot pass the fingerprint. Reuses the existing --update-fingerprint pytest option so intentional data changes regenerate the reference and skip assertions on that run. When an assertion failure coincides with a Tax-Calculator version change, the failure message flags version drift as a likely cause and prints the regeneration command. Replaces the reproducibility role of the skipped tests/test_tmd_stats.py. Deletion of test_tmd_stats.py and its .stats-expect reference is tracked separately in the umbrella cleanup PR (PR 4 in issue #501). Runtime on the current 215,494-row, 109-column file: under 1 second (670 ms CSV load, 48 ms compute). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
martinholmer
approved these changes
Apr 24, 2026
Collaborator
martinholmer
left a comment
There was a problem hiding this comment.
New test passed on my comptuer.
4 tasks
This was referenced Apr 24, 2026
Merged
donboyd5
added a commit
to donboyd5/tax-microdata-benchmarking
that referenced
this pull request
Apr 25, 2026
…enue files Delete four categories of now-unused test artifacts from the PSLmodels#430 / PSLmodels#501 skipped-tests cleanup: 1. tests/test_tmd_stats.py + tests/tmd.stats-expect Superseded by tests/test_tmd_file_fingerprint.py (landed in PSLmodels#504). The new fingerprint test covers the same reproducibility-check role with a relative-tolerance comparison that absorbs cross-machine floating-point noise, instead of the exact-text diff of df.describe() output that caused test_tmd_stats.py to be skipped. 2. tests/tmd.stats-expect-github + tests/tmd.stats-expect-mrh Alternate reference files that sat alongside tmd.stats-expect; no live references anywhere in the repo. 3. tests/expected_itax_rev_2021_data.yaml + tests/expected_ptax_rev_2021_data.yaml Unreferenced orphan files — the live lookup in test_tax_revenue.py always uses the 2022 YAMLs (TAXYEAR=2022 only). Zero references in the repo. 4. Updated the docstring in tests/test_tmd_file_fingerprint.py to remove the now-stale path reference to test_tmd_stats.py, replacing it with past-tense "the previously-skipped test_tmd_stats pattern". Not included in this PR: - test_variable_totals.py and test_misc.py::test_income_tax stay in place for now; their deletion is tied to the SOI sanity-check work that is being tracked separately. - test_imputed_variable_distribution stays untouched per its author's preference. Test plan: - make format: clean. - make lint: exit 0. - make test: 59 passed, 4 skipped (same set of skip markers as before this PR; this change does not affect any currently-running test). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tests/test_tmd_file_fingerprint.py— a reproducibility fingerprinttest for the unweighted national file
tmd/storage/output/tmd.csv.gz,paralleling the existing area-weights fingerprint at
tests/test_fingerprint.pyin structure, update workflow, and tolerance.tests/fingerprints/tmd_file_fingerprint.json,generated from the current
tmd.csv.gzon this branch.make test(not excluded from the Makefile; it only requires
tmd.csv.gz, whichmake testalready depends on viatmd_files).How it works
For each column of
tmd.csv.gz, the test records six statistics:count(integer, exact match)sum,weighted_sum(=Σ column × s006),std,min,max(compared with relative tolerance
rtol=1e-3)weighted_sumlocks each column's relationship with the record weights006. Without it, a regeneration that preserved each column's owndistribution but shuffled which records received which weights could pass
the fingerprint while every weighted 2022 total changed.
The test reuses the existing
--update-fingerprintpytest option fromtests/conftest.pyso intentional data changes regenerate the referenceand skip assertions on that run. When an assertion failure coincides with
a Tax-Calculator version change, the failure message flags version drift
as a likely cause and prints the regeneration command.
For Developers: To generate a new fingerprint, run
pytest tests/test_tmd_file_fingerprint.py --update-fingerprintonyour own machine, then
pytest tests/test_tmd_file_fingerprint.py.Both should pass and the regenerated reference should differ from
the committed one by no more than floating-point noise (
rtol=1e-3).Why this replaces
test_tmd_stats.pytests/test_tmd_stats.pyis currently disabled (@pytest.mark.skip). Itwrites
df.describe()output to a plain-text file and compares itline-by-line against a committed reference text file, requiring every
number to match exactly. That exact-match design fails on identical files
because floating-point math is not bit-identical across machines:
different CPUs, different numerical libraries (OpenBLAS vs MKL), and
different NumPy or pandas versions routinely produce results that differ
in the 15th or 16th decimal place. It also has no workflow for promoting
an intentional data change — someone would have to rename files by hand.
The proposed fingerprint's 0.1% relative tolerance comfortably absorbs
cross-machine floating-point noise (typically one part in a million)
while catching real data regressions (typically > 1%). Failure messages
name the specific column and statistic that moved.
Deletion of
test_tmd_stats.pyand its.stats-expectreference willhappen in the umbrella's cleanup PR (PR 4 in #501), not here.
Test plan
pytest tests/test_tmd_file_fingerprint.py -v --update-fingerprintwrites
tests/fingerprints/tmd_file_fingerprint.jsonand skips theassertion. Verified: file created, 109 columns covered, 19 KB.
pytest tests/test_tmd_file_fingerprint.py -vpasses in ~0.8 s against the committed reference.
e00200.sumby +2% ande00200.weighted_sumby −3% in the referenceJSON. Test failed with a readable per-column, per-stat message:
make format— clean.make lint— exit 0.Related
iitax/payrolltaxwith a CBO or Treasury comparator #502 (future-year revenue comparator)tests/test_fingerprint.py(originally refined in Improve cross-machine reproducibility "fingerprint" test for area weights #477)