Skip to content

Add unweighted national-file fingerprint test (closes #503)#504

Merged
martinholmer merged 1 commit intomasterfrom
unweighted-file-fingerprint
Apr 24, 2026
Merged

Add unweighted national-file fingerprint test (closes #503)#504
martinholmer merged 1 commit intomasterfrom
unweighted-file-fingerprint

Conversation

@donboyd5
Copy link
Copy Markdown
Collaborator

@donboyd5 donboyd5 commented Apr 24, 2026

Summary

  • New tests/test_tmd_file_fingerprint.py — a reproducibility fingerprint
    test for the unweighted national file tmd/storage/output/tmd.csv.gz,
    paralleling the existing area-weights fingerprint at
    tests/test_fingerprint.py in structure, update workflow, and tolerance.
  • New committed reference tests/fingerprints/tmd_file_fingerprint.json,
    generated from the current tmd.csv.gz on this branch.
  • Runs in under one second and is eligible to run under make test
    (not excluded from the Makefile; it only requires tmd.csv.gz, which
    make test already depends on via tmd_files).

How it works

For each column of tmd.csv.gz, the test records six statistics:

  • count (integer, exact match)
  • sum, weighted_sum (= Σ column × s006), std, min, max
    (compared with relative tolerance rtol=1e-3)

weighted_sum locks each column's relationship with the record weight
s006. Without it, a regeneration that preserved each column's own
distribution but shuffled which records received which weights could pass
the fingerprint while every weighted 2022 total changed.

The test reuses the existing --update-fingerprint pytest option from
tests/conftest.py so intentional data changes regenerate the reference
and skip assertions on that run. When an assertion failure coincides with
a Tax-Calculator version change, the failure message flags version drift
as a likely cause and prints the regeneration command.

For Developers: To generate a new fingerprint, run
pytest tests/test_tmd_file_fingerprint.py --update-fingerprint on
your own machine, then pytest tests/test_tmd_file_fingerprint.py.
Both should pass and the regenerated reference should differ from
the committed one by no more than floating-point noise (rtol=1e-3).

Why this replaces test_tmd_stats.py

tests/test_tmd_stats.py is currently disabled (@pytest.mark.skip). It
writes df.describe() output to a plain-text file and compares it
line-by-line against a committed reference text file, requiring every
number to match exactly. That exact-match design fails on identical files
because floating-point math is not bit-identical across machines:
different CPUs, different numerical libraries (OpenBLAS vs MKL), and
different NumPy or pandas versions routinely produce results that differ
in the 15th or 16th decimal place. It also has no workflow for promoting
an intentional data change — someone would have to rename files by hand.

The proposed fingerprint's 0.1% relative tolerance comfortably absorbs
cross-machine floating-point noise (typically one part in a million)
while catching real data regressions (typically > 1%). Failure messages
name the specific column and statistic that moved.

Deletion of test_tmd_stats.py and its .stats-expect reference will
happen in the umbrella's cleanup PR (PR 4 in #501), not here.

Test plan

  • Generate referencepytest tests/test_tmd_file_fingerprint.py -v --update-fingerprint
    writes tests/fingerprints/tmd_file_fingerprint.json and skips the
    assertion. Verified: file created, 109 columns covered, 19 KB.
  • Assert-pass runpytest tests/test_tmd_file_fingerprint.py -v
    passes in ~0.8 s against the committed reference.
  • Deliberate regression detection — perturbed
    e00200.sum by +2% and e00200.weighted_sum by −3% in the reference
    JSON. Test failed with a readable per-column, per-stat message:
    e00200.sum: 1.3828e+11 -> 1.35569e+11 (rel diff 1.96e-02, rtol=1e-03)
    e00200.weighted_sum: 9.4378e+12 -> 9.72969e+12 (rel diff 3.09e-02, rtol=1e-03)
    
    Restored reference, test passes again.
  • make format — clean.
  • make lint — exit 0.

Related

Adds tests/test_tmd_file_fingerprint.py, which computes per-column
summary statistics on tmd/storage/output/tmd.csv.gz and compares them
against a committed reference JSON. Parallels the area-weights
fingerprint at tests/test_fingerprint.py in structure, update workflow,
and tolerance choice.

For each column the test records six statistics: count (integer, exact
match) and the floating-point sum, weighted_sum (= Σ column × s006),
std, min, max (compared with relative tolerance 1e-3). The weighted_sum
statistic locks each column's relationship with the record weight s006,
so a regeneration that preserved each column's own distribution but
shuffled which records received which weights cannot pass the
fingerprint.

Reuses the existing --update-fingerprint pytest option so intentional
data changes regenerate the reference and skip assertions on that run.
When an assertion failure coincides with a Tax-Calculator version
change, the failure message flags version drift as a likely cause and
prints the regeneration command.

Replaces the reproducibility role of the skipped tests/test_tmd_stats.py.
Deletion of test_tmd_stats.py and its .stats-expect reference is
tracked separately in the umbrella cleanup PR (PR 4 in issue #501).

Runtime on the current 215,494-row, 109-column file: under 1 second
(670 ms CSV load, 48 ms compute).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@donboyd5 donboyd5 requested a review from martinholmer April 24, 2026 13:54
Copy link
Copy Markdown
Collaborator

@martinholmer martinholmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test passed on my comptuer.

@martinholmer martinholmer merged commit 4e03ac1 into master Apr 24, 2026
1 check passed
@martinholmer martinholmer deleted the unweighted-file-fingerprint branch April 24, 2026 14:50
donboyd5 added a commit that referenced this pull request Apr 24, 2026
Remove tests superseded by #504 and orphan 2021 expected-revenue files (part of #430 / #501 cleanup)
donboyd5 added a commit to donboyd5/tax-microdata-benchmarking that referenced this pull request Apr 25, 2026
…enue files

Delete four categories of now-unused test artifacts from the PSLmodels#430 /
PSLmodels#501 skipped-tests cleanup:

1. tests/test_tmd_stats.py + tests/tmd.stats-expect

   Superseded by tests/test_tmd_file_fingerprint.py (landed in PSLmodels#504).
   The new fingerprint test covers the same reproducibility-check role
   with a relative-tolerance comparison that absorbs cross-machine
   floating-point noise, instead of the exact-text diff of
   df.describe() output that caused test_tmd_stats.py to be skipped.

2. tests/tmd.stats-expect-github + tests/tmd.stats-expect-mrh

   Alternate reference files that sat alongside tmd.stats-expect; no
   live references anywhere in the repo.

3. tests/expected_itax_rev_2021_data.yaml +
   tests/expected_ptax_rev_2021_data.yaml

   Unreferenced orphan files — the live lookup in test_tax_revenue.py
   always uses the 2022 YAMLs (TAXYEAR=2022 only). Zero references in
   the repo.

4. Updated the docstring in tests/test_tmd_file_fingerprint.py to
   remove the now-stale path reference to test_tmd_stats.py, replacing
   it with past-tense "the previously-skipped test_tmd_stats pattern".

Not included in this PR:

- test_variable_totals.py and test_misc.py::test_income_tax stay in
  place for now; their deletion is tied to the SOI sanity-check work
  that is being tracked separately.
- test_imputed_variable_distribution stays untouched per its author's
  preference.

Test plan:

- make format: clean.
- make lint: exit 0.
- make test: 59 passed, 4 skipped (same set of skip markers as before
  this PR; this change does not affect any currently-running test).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an unweighted national-file fingerprint test, paralleling the existing area-weights fingerprint

2 participants