Skip to content

add similar library to render diffs from regression test results#1296

Open
Aurashk wants to merge 4 commits into
mainfrom
add-diffing-library
Open

add similar library to render diffs from regression test results#1296
Aurashk wants to merge 4 commits into
mainfrom
add-diffing-library

Conversation

@Aurashk
Copy link
Copy Markdown
Collaborator

@Aurashk Aurashk commented May 19, 2026

Description

Adds the similar diffing library to render the output from failing regression tests in a nicer way

Fixes #1186

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc
  • Update release notes for the latest release if this PR adds a new feature or fixes a bug
    present in the previous release

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

Copilot AI review requested due to automatic review settings May 19, 2026 12:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the similar crate to improve regression-test failure diagnostics by rendering structured diffs instead of line-by-line mismatch lists.

Changes:

  • Adds similar as a dev-dependency.
  • Updates regression comparison to detect mismatches first, then render a diff for differing CSV outputs.
  • Introduces render_diff to show inserted/deleted lines with source line numbers.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
Cargo.toml Adds similar to dev-dependencies for regression test diagnostics.
Cargo.lock Locks the new similar dependency.
tests/regression.rs Replaces per-line mismatch reporting with diff rendering for failed regression output comparisons.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/regression.rs Outdated
fn render_diff(lines1: &[String], lines2: &[String]) -> String {
let text1 = lines1.join("\n");
let text2 = lines2.join("\n");
let diff = TextDiff::from_lines(&text1, &text2);
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.73%. Comparing base (0af4399) to head (3d5b4a7).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1296      +/-   ##
==========================================
+ Coverage   89.46%   89.73%   +0.27%     
==========================================
  Files          57       57              
  Lines        8361     8418      +57     
  Branches     8361     8418      +57     
==========================================
+ Hits         7480     7554      +74     
+ Misses        581      564      -17     
  Partials      300      300              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's right. I tried this locally and when I changed one line of a commodity_flows.csv file, it showed a diff including every line where the text doesn't match exactly, even where the differences in floating-point values were below the threshold.

I think the right way to do this is to use the capture_diff_slices function from the crate. If it returns an empty Vec, the files are the same, otherwise they are different.

Unfortunately to do the comparisons correctly, you'll need to derive a custom type representing a field in the CSV file (an enum), which can be either a non-NaN floating-point number or a plain string. This type needs to derive Eq and Hash. Hash is, unfortunately, not defined for f64, because NaNs break everything. But! There is a NotNan type in the ordered_float crate, which does implement Hash, so you could use this. You'll define Eq to do the comparison logic we've currently got in compare_lines. Each CSV file will be represented as a Vec<Vec<MyCustomFieldType>> and you can pass these to capture_diff_slices.

Does this make sense?

For bonus points, you could use the colored crate to colourise the diff (we already have it as an indirect dependency), but we can probably get by without this too.

@alexdewar
Copy link
Copy Markdown
Member

Actually, there are a couple of things I missed.

You'll also need to make sure that similar floats have the same hash, which is annoying. Probably the easiest way to do this is to round to the nearest number divisible by the tolerance. You'll also need to store string representations of the lines somewhere so you can print them out for the diff.

All in all, this is a little fiddly.

@Aurashk Aurashk requested a review from alexdewar May 20, 2026 10:47
Copy link
Copy Markdown
Member

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly seems to work -- and is definitely an improvement on what we had before! -- but there's something that's not working quite right.

What you're doing is:

  • Compare files using the old comparison code
  • If different, feed into similar's algorithm to get diff
  • Print diff, excluding lines that similar thinks are the same but are different according to compare_line

What we should be able to do is:

  • Feed files into similar's algorithm directly
  • Discard any DiffOps::Equals
  • If there are any diff ops remaining, print them all out

We shouldn't need the old code anymore -- if similar thinks lines are different, but the old code thinks they're the same or vice versa, we're not doing the comparison quite right. There are a bunch of output files with spurious replacements in them that we end up ignoring, so something's not working.

I'm not sure what to do, because your new code looks ok to me. It's just not giving the same results as the old. We don't want to let the perfect be the enemy of the good though and it is an improvement.

Maybe it would work better if we represented decimal numbers in base-10 instead (e.g. using this crate; though when I tried it, I was still getting some seemingly spurious mismatches).

If you'd like to fiddle with this more to see if we can yeet the old code, then that would be great, but otherwise I'm happy to merge this and open an issue about this problem. What are you feeling?

Comment thread tests/regression.rs
for idx in 0..paired_len {
let old_idx = old_start + idx;
let new_idx = new_start + idx;
if !compare_line(&lines1[old_idx], &lines2[new_idx]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra comparison shouldn't be necessary 😞. The diff algo thinks they're different and if compare_line says otherwise, then it's not doing the same comparison that we're getting with the Eq implementation for CSVLine, which is a problem.

Comment thread tests/regression.rs
Comment on lines +238 to +240
let scaled = value / FLOAT_CMP_TOLERANCE;
let quantised = if scaled.is_finite() {
scaled.round() * FLOAT_CMP_TOLERANCE
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesturner246 As the resident floating-point guru, does this seem like an ok way to round a number to the nearest FLOAT_CMP_TOLERANCE? Numbers need to come out exactly the same if they are within FLOAT_CMP_TOLERANCE of one another, because they need to have the same hash.

Copy link
Copy Markdown

@jamesturner246 jamesturner246 May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not be the answer you want, but I'd go with the approx equals routes.

Absolute tolerance: abs(a-b) <= abs_tol.
Relative tolerance: abs(a-b) <= rel_tol * max(abs(a), abs(b)).

The issue with this code is that

  1. there's the risk of overflow if FLOAT_CMP_TOLERANCE is small
  2. given scaled will be large, the rounding will be courser, distorting the calculation
  3. It seems rust's round has weird nonstandard behaviour: it rounds away from zero, rather than rounding to nearest, ties to even LSB required by IEEE-754

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aurashk It seems like this approach to quantisation isn't going to work because... floating-point numbers 😞.

Are you happy to have another go at this using the rust_decimal crate instead? That should give us proper, consistent rounding, so that we don't end up with two CsvLines within the tolerance giving different hashes (due to FP rounding errors), which is what we have currently. You may have to tweak the tolerance slightly or regenerate some test files to get things to pass because it is slightly different from the current tolerance checking we're doing, but that's ok.

NB: You can use different rounding strategies with this crate (see here). Dunno if this is something we'll want to change or whether the default MidpointNearestEven is fine.

Btw I have verified that we do see these rounding errors with our actual data, e.g. the missing_commodity test fails with:

-L608: 2040,10,BIOPRD,autumn.peak,25.852906323049822
CsvLine([Float(2040.0), Float(10.0), Text("BIOPRD"), Text("autumn.peak"), Float(25.852906323000003)])
+L608: 2040,10,BIOPRD,autumn.peak,25.852906323050078
CsvLine([Float(2040.0), Float(10.0), Text("BIOPRD"), Text("autumn.peak"), Float(25.8529063231)])

Copy link
Copy Markdown
Collaborator Author

@Aurashk Aurashk May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried another thing too which seems to work and is much simpler, see if you are happy with this. If not I can have a go with the decimal crate.

  1. filter through each line with existing floating point tolerance first - if they meet the tolerance make the lines exactly equal
  2. feed everything to the diffing library
    I appreciate it feels a bit wasteful going through twice, but there seems to be no palpable performance change. If you want to try it just put this in your regression.rs

regression.txt

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @AdrianDAlessandro suggested, this would be a lot faster if we only do this extra work in the case that we already know the files are different, because currently it's comparing every line of one file with every line in another using our probably quite slow comparison method, so when comparing two files that are 1000 lines long, you'd get 1m comparisons, rather than 1000.

This still isn't quite correct though. If you have two identical lines in your CSV file and one of them is deleted, it won't appear in the diff, because it will find a match somewhere else in the file. For it to be totally right, the diffing algorithm has to be the single source of truth. That said, I suspect it will probably work fine in practice, at least most of the time.

You probably wouldn't need to change much to use the decimal type I mentioned though. It would mostly be a case of swapping out the NotNan for a Decimal and deleting the old comparison code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use proper diff algorithm for output of failing regression tests

4 participants