You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PrecisionFromfloat32() can be inlined and runs < 0.5 ns/op. It
indicates exact, inexact, underflow and overflow if the specified
float32 is converted to float16 (IEEE 754 binary16).
IsQuietNaN() indicates whether the specified NaN has nan-quiet-bit set.
Closes: #3
`float16` package provides [IEEE 754 half-precision floating-point format](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) with IEEE 754 default rounding for conversions. IEEE 754-2008 refers to this 16-bit floating-point format as binary16.
8
+
`float16` package provides [IEEE 754 half-precision floating-point format (binary16)](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) with IEEE 754 default rounding for conversions. IEEE 754-2008 refers to this 16-bit floating-point format as binary16.
9
9
10
10
IEEE 754 default rounding ("Round-to-Nearest RoundTiesToEven") is considered the most accurate and statistically unbiased estimate of the true result.
11
11
12
12
All possible 4+ billion floating-point conversions with this library are verified to be correct.
13
13
14
+
This library uses the lowercase word "float16" to refer to IEEE 754 binary16. And uses capitalized "Float16" to export a Go data type representing float16.
15
+
14
16
## Features
15
17
Current features include:
16
18
17
19
* float16 to float32 conversions use lossless conversion.
18
20
* float32 to float16 conversions use IEEE 754-2008 "Round-to-Nearest RoundTiesToEven".
19
21
* conversions use __zero allocs__ and are about __2.65 ns/op__ (in pure Go) on a desktop amd64.
20
22
* unit tests provide 100% code coverage and check all possible 4+ billion conversions.
21
-
* other functions include: IsFinite(), IsInf(), IsNaN(), IsNormal(), Signbit(), and String().
23
+
* other functions include: IsInf(), IsNaN(), IsNormal(), PrecisionFromfloat32(), String(), etc.
22
24
* all functions in this library use zero allocs except String().
23
25
24
26
## Status
25
-
This library is used by [fxamacker/cbor](https://github.com/fxamacker/cbor) and is ready for production use on supported platforms.
27
+
This library is used by [fxamacker/cbor](https://github.com/fxamacker/cbor) and is ready for production use on supported platforms. The version number < 1.0 indicates more functions and options are planned but not yet published.
26
28
27
29
Current status:
28
30
29
-
* core API is done and breaking API changes are unlikely except Fromfloat32() to add options.
31
+
* core API is done and breaking API changes are unlikely.
30
32
* 100% of unit tests pass:
31
-
* short mode (`go test -short`) tests around 65763 conversions in 0.005s.
32
-
* normal mode (`go test`) tests all possible 4+ billion conversions in about 45s.
33
+
* short mode (`go test -short`) tests around 65765 conversions in 0.005s.
34
+
* normal mode (`go test`) tests all possible 4+ billion conversions in about 75s.
33
35
* 100% code coverage with both short mode and normal mode.
34
36
* tested on amd64 but it should work on all little-endian platforms supported by Go.
35
37
36
38
Roadmap:
37
39
38
-
* add a function to both convert and report precision issues in one call.
39
40
* add functions for fast batch conversions.
40
41
* speed up unit test when verifying all possible 4+ billion conversions.
41
42
* test on additional platforms.
@@ -48,9 +49,9 @@ Unit tests take a fraction of a second to check all 65536 expected values for fl
48
49
## Float32 to Float16 Conversion
49
50
Conversions from float32 to float16 use IEEE 754 default rounding ("Round-to-Nearest RoundTiesToEven"). All 4294967296 possible float32 to float16 conversions (in pure Go) are confirmed to be correct.
50
51
51
-
Unit tests in normal mode take about 35-55 seconds to check all 4+ billion expected values for float32 to float16 conversions.
52
+
Unit tests in normal mode take about 60-90 seconds to check all 4+ billion expected values for float32 to float16 conversions as well as PrecisionFromfloat32() for each.
52
53
53
-
Unit tests in short mode use a small subset (65763) of expected values and finish in under 1 second while still reaching 100% code coverage.
54
+
Unit tests in short mode use a small subset (65765) of expected values and finish in under 0.01 second while still reaching 100% code coverage.
See [API](https://godoc.org/github.com/cbor-go/float16) at godoc.org for more info.
91
101
92
102
## Benchmarks
93
-
Conversions (in pure Go) are around 2.65 ns/op for float16 to Float32 as well as Float32 to float16 on amd64.
103
+
Conversions (in pure Go) are around 2.65 ns/op for float16 to Float32 as well as Float32 to float16 on amd64. And speeds can vary depending on input value.
94
104
95
-
Frombits is included as a canary to catch overoptimized benchmarks. Frombits should be faster than all other functions.
105
+
Frombits is included as a canary to catch overoptimized benchmarks. It should be faster than all other functions except PrecisionFromfloat32.
96
106
```
97
107
All functions have zero allocations except float16.String().
98
108
99
109
FromFloat32pi-2 2.59ns ± 0% // speed using Fromfloat32() to convert a float32 of math.Pi to Float16
100
110
ToFloat32pi-2 2.69ns ± 0% // speed using Float32() to convert a float16 of math.Pi to float32
101
-
Frombits-2 0.36ns ± 8% // speed using Frombits() to cast a uint16 to Float16
111
+
Frombits-2 0.29ns ± 5% // speed using Frombits() to cast a uint16 to Float16
112
+
113
+
PrecisionFromFloat32-2 0.29ns ± 1% // speed using PrecisionFromfloat32() to check for overflows, etc.
0 commit comments