floating-point followup#4560
Conversation
| fl_near_zero((v)->xyz.z)) | ||
|
|
||
| // macro to check if vector is close to zero | ||
| // (original threshold was 1e-36 which was too small) |
There was a problem hiding this comment.
According to what part of the article or what other research? I read the article, and I don't see how it points it 1e-36 being too small.
In fact, he suggests using a comparison "whose value might be some small multiple of FLT_EPSILON"
There was a problem hiding this comment.
The "too small" comment isn't based on the article, but on typical usage within FSO. See the list of four use cases.
I looked at typical magnitudes in coordinates and normals, and in my assessment our floating point values are "effectively zero" long before they ever get to a magnitude of 1e-36. Even MOI values of 1e-16 to 1e-19 are outliers compared to our usual range of values.
The determination of the threshold is closer to an art than a science. I suspect whoever came up with that original threshold was thinking about FLT_MIN, which is 1e-37.
There was a problem hiding this comment.
But the point of this is to make sure that a vector is safe to use, not that it satisfies the range of known fso use cases. It is correct to keep the value as it is, because no one who uses this in the future is going to be thinking, "I am checking to make sure it is in line with known fso use case values." They are going to be thinking, "I need to make sure bad data is not used here, because other values are going to be based off these." Values down to 1e-36 are still safe to use according to what I've found, and changing low level values because we haven't seen them in our experience is not a good idea.
There was a problem hiding this comment.
"Whether a vector is safe" is intrinsically tied to "known FSO use cases". That's one of the takeaways from the article: evaluating floating point values is highly dependent on context. Even something as natural as using FLT_EPSILON as a default threshold falls down for floating point values greater than 2.0, because the difference between any two floats greater than 2.0 is greater than FLT_EPSILON.
We want to catch errors and bad data. The way to do this is to determine what the range of values is expected to be, and then set the threshold to be close to, but outside, that range. If the threshold is too far outside that range, then we risk letting through data that is bad but that still falls within the too-generous threshold.
There was a problem hiding this comment.
Can you point to any instance where a float being between 1e-7 and 1e-36 caused an actual issue?
There was a problem hiding this comment.
That's the problem... if any values slipped through that should have been zero but weren't, and if the magnitudes were between 1e-7 and 1e-36, we would not have known. They would have caused logic errors that resulted in weird behavior, but such behavior would not have been flagged.
There was a problem hiding this comment.
We haven't documented any unexplained weird behaviors, to my knowledge. The most significant unexplained issues have been balance changes in some campaigns, but if you are pointing to changes since the value was changed to 1e-36 from 1e-16, then we should be changing the value back to 1e-16 and not going in the other direction. You have yet to explain your reasoning for picking epsilon, because epsilon in the article is not suggested for comparing against zero.
Re-reading the article, the advice that I see is that when comparing against zero, we should use a constant value, which we are. Floats are still normal up to 1e-36, and the previous value was also constant. Saying that we are getting logic errors by using 1e-36 is dubious and needs specific examples. We need either a known issue or a specific potential issue that we can logically follow, examine, and potentially critique. If you aren't sure whether such an issue could exist, then you haven't actually finished your research. Also, If you need a separate value for model translations, you should specifically say so and explain why and possibly implement a separate macro.
I'm looking at the different places where IS_VEC_NULL and IS_VEC_NULL_SQ_SAFE are used, and the uses I see are "Is this vector safe enough to do with?"
So for instance, thinking abstractly about one use I saw, "Can I do a dot product with this vector?" Doing a dot product requires normalizing the vector. And if one of the float values of the vector is 1e-35 and the other two are 0.0f, then it would come up with a realistic value when normalized, easily: 1.0f, 0.0f, 0.0f. Or maybe 1e-35, 1e-40, 1e-40. Still a realistic value when normalized because the 1e-36 is 4 orders of magnitude greater. You lose a little bit of accuracy if it is something like 1e-35, 1e-36, 1e-36, because there's only so much accuracy that the values can have that close to zero, but that is pretty rare. Now if you are worried about that edge case, just changing it back to 1e-16, its previous value, then normalizing something like 1e-16 1e-16 would probably not present that same kind of issue. And that is all that I can think of in this case.
And the other thing is IS_VEC_NULL is used 9 times in ai code (more than 10% of the total use cases), and adjusting these values to more strict than retail, without a good reason could actually lead to subtle ai changes or the weird behavior that you are trying to avoid.
There needs to be a specific reason to change it to float epsilon and not back to its previous value, 1e-16.
There was a problem hiding this comment.
In retail, this is what the code looked like:
//macro to check if vector is zero
#define IS_VEC_NULL(v) (((v)->x == (float)0.0) && ((v)->y == (float)0.0) && ((v)->z == (float)0.0))
The 1e-16 threshold was introduced in commit 06ef991 in 2009 and was, as far as I can tell, arbitrary. It was in response to this Mantis bug: http://scp.indiegames.us/mantis/view.php?id=1090
My reasoning is simply that if we are going to check for an arbitrary threshold, then it should be a threshold that is close to the boundary of normal usage. Epsilon, at 1e-07, appears reasonable here.
The alternative is to choose a threshold based on mathematical analysis, which is preferred, but I wouldn't know how to choose a threshold based on that method. @Baezon's reasoning from the matrix's determinant is great for the MOI case, but not relevant to the orientation, normal, or vector case.
There was a problem hiding this comment.
Mainly I want a good reason to pick whatever strategy we pick. To be fair, a threshold of epsilon is somewhat arbitrary, but as far as I can tell the 1e-16 threshold is arbitrary too.
Here is what @BMagnu said, which I have permission to quote:
I think we can argue both ways in a sensible manner:
Pro's to keeping 1e-36:
Keeping it won't suddenly break things for things that somehow were below the threshold but worked fine. There is no danger of creating a requirement to update some obscure old mod that might break from this. Additionally, we don't know of any actual example that would immediately fix an error that is slipping by with 1e-36. Most of the math we use should be fine as long as the floats aren't denormal or would cause an INF overflow when we do 1/x. 1e-36 should be sufficient for that.Pro's to changing to 1e-7:
There might be things that, at values smaller than 1e-7, are data which can be properly processed and not crash, but are semantically invalid data. This would be then caught and reported. In FSO's coordinate systems, any location or velocity (and maybe even acceleration) closer than 1e-7 is so small that it may as well be a hard 0, semantically (though this may not be the case for othermisuses of the vec3d struct)
The latter paragraph is essentially my position. Furthermore, tightening the threshold would let us know for sure, and the threshold can always be loosened later.
There was a problem hiding this comment.
That will teach me (not for the first time) to double check that I've gone far enough back on git blame.
I would prefer that we go with the original threshold after retail's buggy check was changed. 1e-16. But Lafiel's suggestion on discord that we try out the change is a good one. So let's go with that.
98f0bb2 to
87223df
Compare
|
As a general rule, changes to the FSO engine should not reject data that was valid in previous versions of the engine unless there's a very good reason to. We've had major problems in recent releases because this principle wasn't respected. This 'cleanup' change is actually subtly changing the behaviour of fundamental numerical operations and I just don't see what the point is in doing that without any concrete bugs or incorrectness to be fixed. |
|
The intention is never to reject data that was really and truly valid in a previous version. The objective here is to be deliberate about what valid data actually is. If data should never be smaller than 1e-07, then the threshold should be set at 1e-07, not 1e-36. Or to use a visual metaphor, if a road is 24 feet wide, and the standard lane width is 12 feet, then it is appropriate to make two 12-feet lanes. It's not appropriate to keep it one lane because "the road has always had only one lane". Conversely, if a road is 16 feet wide, then making two 8-feet lanes is not appropriate. But those are two different situations. |
b9de643 to
40566a5
Compare
Floating-point comparisons are tricky beasts. See this article for a good summary of the issues involved: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ This PR improves support somewhat, although there are obviously situations in the code (such as comparisons to 0) that should still be addressed. This PR makes the following improvements: 1) Tightens the threshold on `IS_VEC_NULL` and `IS_VEC_NULL_SQ_SAFE` from 1e-16 and 1e-36 to FLT_EPSILON 2) Tightens the threshold for parsing `turn_time` in POFs 3) Restores the previous implementation of `fl_equal` 4) Adds a dedicated `IS_MOI_VEC_NULL` for moment of inertia values 5) Improves documentation Follow-up to scp-fs2open#4535 and scp-fs2open#4443.
40566a5 to
4dfb29f
Compare
Floating-point comparisons are tricky beasts. See this article for a good summary of the issues involved:
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
This PR improves support somewhat, although there are obviously situations in the code (such as comparisons to 0) that should still be addressed. This PR makes the following improvements:
IS_VEC_NULLandIS_VEC_NULL_SQ_SAFEfrom 1e-16 and 1e-36 to FLT_EPSILONturn_timein POFsfl_equalIS_MOI_VEC_NULLfor moment of inertia valuesFollow-up to #4535 and #4443.