You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The EPP MUST support streaming mode for inference requests and responses. Streaming mode enables full-duplex communication between the Gateway (Envoy), EPP, and model servers, allowing real-time token-by-token response delivery for AI inference workloads.
18
+
17
19
## Version History
18
20
19
21
| Version | Date | Changes |
@@ -94,5 +96,50 @@ filterMetadata: {
94
96
95
97
This metadata is required because the EPP provides a list of endpoints to the data plane (see [Destination Endpoint](#destination-endpoint)), and the data plane, according to retry configuration, will attempt each endpoint in order until the request is successful or no more endpoints are available.
96
98
99
+
## Health Checking
100
+
101
+
The EPP MUST implement health checking to enable monitoring, load balancing, and high availability. The EPP exposes health check endpoints following the [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
102
+
103
+
### Health Check Services
104
+
105
+
The EPP exposes the following health check services:
106
+
107
+
**Liveness Check** (`liveness`): Determines if the EPP process is alive and responsive. Returns `SERVING` if the EPP process can respond to gRPC requests. This check does not depend on datastore sync status or leader election state.
108
+
109
+
**Readiness Check** (`readiness`): Determines if the EPP is ready to accept and process inference requests. Returns `SERVING` if the EPP datastore has synced and the EPP is the elected leader (in multi-replica deployments). Returns `NOT_SERVING` if the datastore has not synced or the EPP is a follower.
110
+
111
+
**External Processor Service Check** (`envoy.service.ext_proc.v3.ExternalProcessor`): Verifies the main ext_proc service is healthy. Returns `SERVING` if the EPP is ready to process ext_proc requests (same criteria as readiness check).
112
+
113
+
### Health Check Protocol
114
+
115
+
The EPP implements the standard gRPC Health Checking Protocol:
- **Follower Pods**: Liveness returns `SERVING`, Readiness returns `NOT_SERVING`. Do not process inference requests but remain alive for failover.
141
+
142
+
This ensures only the leader pod receives traffic while follower pods remain alive for failover.
143
+
97
144
### Why envoy.lb namespace as a default?
98
145
The `envoy.lb` namespace is a predefined namespace. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. Note that this is not related to the subsetting feature discussed above, this is an enovy implementation detail.
0 commit comments