Skip to content

Commit 0fc3b12

Browse files
docs: update EPP protocol spec with streaming mode and health check requirements
1 parent 743480f commit 0fc3b12

File tree

1 file changed

+47
-0
lines changed
  • docs/proposals/004-endpoint-picker-protocol

1 file changed

+47
-0
lines changed

docs/proposals/004-endpoint-picker-protocol/README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ This doc defines the protocol between the EPP and the proxy (e.g, Envoy).
1414
The EPP MUST implement the Envoy
1515
[external processing service](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto) protocol.
1616

17+
The EPP MUST support streaming mode for inference requests and responses. Streaming mode enables full-duplex communication between the Gateway (Envoy), EPP, and model servers, allowing real-time token-by-token response delivery for AI inference workloads.
18+
1719
## Version History
1820

1921
| Version | Date | Changes |
@@ -94,5 +96,50 @@ filterMetadata: {
9496
9597
This metadata is required because the EPP provides a list of endpoints to the data plane (see [Destination Endpoint](#destination-endpoint)), and the data plane, according to retry configuration, will attempt each endpoint in order until the request is successful or no more endpoints are available.
9698
99+
## Health Checking
100+
101+
The EPP MUST implement health checking to enable monitoring, load balancing, and high availability. The EPP exposes health check endpoints following the [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
102+
103+
### Health Check Services
104+
105+
The EPP exposes the following health check services:
106+
107+
**Liveness Check** (`liveness`): Determines if the EPP process is alive and responsive. Returns `SERVING` if the EPP process can respond to gRPC requests. This check does not depend on datastore sync status or leader election state.
108+
109+
**Readiness Check** (`readiness`): Determines if the EPP is ready to accept and process inference requests. Returns `SERVING` if the EPP datastore has synced and the EPP is the elected leader (in multi-replica deployments). Returns `NOT_SERVING` if the datastore has not synced or the EPP is a follower.
110+
111+
**External Processor Service Check** (`envoy.service.ext_proc.v3.ExternalProcessor`): Verifies the main ext_proc service is healthy. Returns `SERVING` if the EPP is ready to process ext_proc requests (same criteria as readiness check).
112+
113+
### Health Check Protocol
114+
115+
The EPP implements the standard gRPC Health Checking Protocol:
116+
117+
```protobuf
118+
rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
119+
120+
message HealthCheckRequest {
121+
string service = 1;
122+
}
123+
124+
message HealthCheckResponse {
125+
enum ServingStatus {
126+
UNKNOWN = 0;
127+
SERVING = 1;
128+
NOT_SERVING = 2;
129+
SERVICE_UNKNOWN = 3;
130+
}
131+
ServingStatus status = 1;
132+
}
133+
```
134+
135+
### Leader Election Support
136+
137+
When leader election is enabled (multi-replica deployments):
138+
139+
- **Leader Pod**: Liveness returns `SERVING`, Readiness returns `SERVING` (if datastore synced). Processes all inference requests.
140+
- **Follower Pods**: Liveness returns `SERVING`, Readiness returns `NOT_SERVING`. Do not process inference requests but remain alive for failover.
141+
142+
This ensures only the leader pod receives traffic while follower pods remain alive for failover.
143+
97144
### Why envoy.lb namespace as a default?
98145
The `envoy.lb` namespace is a predefined namespace. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. Note that this is not related to the subsetting feature discussed above, this is an enovy implementation detail.

0 commit comments

Comments
 (0)