add similar library to render diffs from regression test results#1296
add similar library to render diffs from regression test results#1296Aurashk wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds the similar crate to improve regression-test failure diagnostics by rendering structured diffs instead of line-by-line mismatch lists.
Changes:
- Adds
similaras a dev-dependency. - Updates regression comparison to detect mismatches first, then render a diff for differing CSV outputs.
- Introduces
render_diffto show inserted/deleted lines with source line numbers.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
Cargo.toml |
Adds similar to dev-dependencies for regression test diagnostics. |
Cargo.lock |
Locks the new similar dependency. |
tests/regression.rs |
Replaces per-line mismatch reporting with diff rendering for failed regression output comparisons. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn render_diff(lines1: &[String], lines2: &[String]) -> String { | ||
| let text1 = lines1.join("\n"); | ||
| let text2 = lines2.join("\n"); | ||
| let diff = TextDiff::from_lines(&text1, &text2); |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1296 +/- ##
==========================================
+ Coverage 89.46% 89.73% +0.27%
==========================================
Files 57 57
Lines 8361 8418 +57
Branches 8361 8418 +57
==========================================
+ Hits 7480 7554 +74
+ Misses 581 564 -17
Partials 300 300 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
alexdewar
left a comment
There was a problem hiding this comment.
Copilot's right. I tried this locally and when I changed one line of a commodity_flows.csv file, it showed a diff including every line where the text doesn't match exactly, even where the differences in floating-point values were below the threshold.
I think the right way to do this is to use the capture_diff_slices function from the crate. If it returns an empty Vec, the files are the same, otherwise they are different.
Unfortunately to do the comparisons correctly, you'll need to derive a custom type representing a field in the CSV file (an enum), which can be either a non-NaN floating-point number or a plain string. This type needs to derive Eq and Hash. Hash is, unfortunately, not defined for f64, because NaNs break everything. But! There is a NotNan type in the ordered_float crate, which does implement Hash, so you could use this. You'll define Eq to do the comparison logic we've currently got in compare_lines. Each CSV file will be represented as a Vec<Vec<MyCustomFieldType>> and you can pass these to capture_diff_slices.
Does this make sense?
For bonus points, you could use the colored crate to colourise the diff (we already have it as an indirect dependency), but we can probably get by without this too.
|
Actually, there are a couple of things I missed. You'll also need to make sure that similar floats have the same hash, which is annoying. Probably the easiest way to do this is to round to the nearest number divisible by the tolerance. You'll also need to store string representations of the lines somewhere so you can print them out for the diff. All in all, this is a little fiddly. |
…in floating precision add tests that check diffs rendered are as expected
alexdewar
left a comment
There was a problem hiding this comment.
This mostly seems to work -- and is definitely an improvement on what we had before! -- but there's something that's not working quite right.
What you're doing is:
- Compare files using the old comparison code
- If different, feed into
similar's algorithm to get diff - Print diff, excluding lines that
similarthinks are the same but are different according tocompare_line
What we should be able to do is:
- Feed files into
similar's algorithm directly - Discard any
DiffOps::Equals - If there are any diff ops remaining, print them all out
We shouldn't need the old code anymore -- if similar thinks lines are different, but the old code thinks they're the same or vice versa, we're not doing the comparison quite right. There are a bunch of output files with spurious replacements in them that we end up ignoring, so something's not working.
I'm not sure what to do, because your new code looks ok to me. It's just not giving the same results as the old. We don't want to let the perfect be the enemy of the good though and it is an improvement.
Maybe it would work better if we represented decimal numbers in base-10 instead (e.g. using this crate; though when I tried it, I was still getting some seemingly spurious mismatches).
If you'd like to fiddle with this more to see if we can yeet the old code, then that would be great, but otherwise I'm happy to merge this and open an issue about this problem. What are you feeling?
| for idx in 0..paired_len { | ||
| let old_idx = old_start + idx; | ||
| let new_idx = new_start + idx; | ||
| if !compare_line(&lines1[old_idx], &lines2[new_idx]) { |
There was a problem hiding this comment.
This extra comparison shouldn't be necessary 😞. The diff algo thinks they're different and if compare_line says otherwise, then it's not doing the same comparison that we're getting with the Eq implementation for CSVLine, which is a problem.
| let scaled = value / FLOAT_CMP_TOLERANCE; | ||
| let quantised = if scaled.is_finite() { | ||
| scaled.round() * FLOAT_CMP_TOLERANCE |
There was a problem hiding this comment.
@jamesturner246 As the resident floating-point guru, does this seem like an ok way to round a number to the nearest FLOAT_CMP_TOLERANCE? Numbers need to come out exactly the same if they are within FLOAT_CMP_TOLERANCE of one another, because they need to have the same hash.
There was a problem hiding this comment.
I know it's not be the answer you want, but I'd go with the approx equals routes.
Absolute tolerance: abs(a-b) <= abs_tol.
Relative tolerance: abs(a-b) <= rel_tol * max(abs(a), abs(b)).
The issue with this code is that
- there's the risk of overflow if
FLOAT_CMP_TOLERANCEis small - given
scaledwill be large, the rounding will be courser, distorting the calculation - It seems rust's
roundhas weird nonstandard behaviour: it rounds away from zero, rather than rounding to nearest, ties to even LSB required by IEEE-754
There was a problem hiding this comment.
@Aurashk It seems like this approach to quantisation isn't going to work because... floating-point numbers 😞.
Are you happy to have another go at this using the rust_decimal crate instead? That should give us proper, consistent rounding, so that we don't end up with two CsvLines within the tolerance giving different hashes (due to FP rounding errors), which is what we have currently. You may have to tweak the tolerance slightly or regenerate some test files to get things to pass because it is slightly different from the current tolerance checking we're doing, but that's ok.
NB: You can use different rounding strategies with this crate (see here). Dunno if this is something we'll want to change or whether the default MidpointNearestEven is fine.
Btw I have verified that we do see these rounding errors with our actual data, e.g. the missing_commodity test fails with:
-L608: 2040,10,BIOPRD,autumn.peak,25.852906323049822
CsvLine([Float(2040.0), Float(10.0), Text("BIOPRD"), Text("autumn.peak"), Float(25.852906323000003)])
+L608: 2040,10,BIOPRD,autumn.peak,25.852906323050078
CsvLine([Float(2040.0), Float(10.0), Text("BIOPRD"), Text("autumn.peak"), Float(25.8529063231)])
There was a problem hiding this comment.
I tried another thing too which seems to work and is much simpler, see if you are happy with this. If not I can have a go with the decimal crate.
- filter through each line with existing floating point tolerance first - if they meet the tolerance make the lines exactly equal
- feed everything to the diffing library
I appreciate it feels a bit wasteful going through twice, but there seems to be no palpable performance change. If you want to try it just put this in your regression.rs
There was a problem hiding this comment.
As @AdrianDAlessandro suggested, this would be a lot faster if we only do this extra work in the case that we already know the files are different, because currently it's comparing every line of one file with every line in another using our probably quite slow comparison method, so when comparing two files that are 1000 lines long, you'd get 1m comparisons, rather than 1000.
This still isn't quite correct though. If you have two identical lines in your CSV file and one of them is deleted, it won't appear in the diff, because it will find a match somewhere else in the file. For it to be totally right, the diffing algorithm has to be the single source of truth. That said, I suspect it will probably work fine in practice, at least most of the time.
You probably wouldn't need to change much to use the decimal type I mentioned though. It would mostly be a case of swapping out the NotNan for a Decimal and deleting the old comparison code.
Description
Adds the similar diffing library to render the output from failing regression tests in a nicer way
Fixes #1186
Type of change
Key checklist
$ cargo test$ cargo docpresent in the previous release
Further checks