0.9.3
This release contains bug fixes, including patches and more tests for bugs introduced in 0.9.2.
If you use proxy for load balancing in failure scenarios, pay attention to a few changes. We've made improvements that will help debug and eliminate sporadic, long-lasting 502 errors, but changed the way the failure logic works in order to do this.
Summarized change list:
- Updated QUIC to newer version
- import: Glob pattern matching 0 files is no longer an error
- fastcgi: Fixed persistent connections (disabled by default)
- fastcgi: Configurable connection pool size parameter
- proxy: Improved failover load balancing logic
- proxy: Avoids duplicating header fields that would be confusing
- proxy: New try_duration and try_interval parameters
- proxy: Fix for IP hash policy when downed hosts come back up
- Several other bug fixes and new tests
Changes specific to proxy (see PR #1135):
fail_timeoutnow defaults to 0. This means that requests which fail will not count against that host's availability. With a value > 0, request failure counting is enabled, and proxy will remember a failed request for this long. If the number of remembered failures accumulates tomax_fails, the backend will be considered down (for everyone) until the failed requests begin to be forgotten.max_failsdefaults to 1 as before, but cannot be set to 0. If your network is flaky (almost all are), try a more reasonable value like 5. Remember, once the number of failed requests to a backend reaches this number within the window offail_timeout, the host will be considered down for all clients until the window shifts ahead.try_durationis a new parameter that specifies how long proxy will check for available hosts. So if a host becomes available within this duration, the request may still succeed. The default is 0, meaning that proxy will not retry when a host initially goes down or no hosts are available. You must set this to a reasonable value > 0 (e.g.30s) if you want robust redundancy.try_intervalspecifies how long to wait between attempts to reach an upstream host. This defaults to 250ms. The idea is to avoid spinning the CPU, so if you set this to 0 along with a non-zero fail_timeout, your CPU may spin until hosts become available again.
Basically: If you want to have proper, redundant load balancing, you must set fail_timeout and try_duration to durations > 0.
We may continue to tweak this logic in the future to get the best defaults for as many users as possible.
Thank you to all who contributed for this release!