InFlightTrackingClient wraps a grpc.Backend and tracks active inference requests in the NodeRegistry. This allows the router's eviction logic to know which models are actively serving and should not be unloaded. Per-replica: a single tracker instance is bound to (nodeID, modelName, replicaIndex). T
| 28 | // until it is wrapped with track() - so a new inference path can't be added |
| 29 | // without an in-flight accounting decision. |
| 30 | type InFlightTrackingClient struct { |
| 31 | grpc.ControlBackend // passthrough for control-plane / streaming-constructor methods |
| 32 | inner grpc.InferenceBackend // tracked inference methods delegate here |
| 33 | registry InFlightTracker |
| 34 | nodeID string |
| 35 | modelName string |
| 36 | replicaIndex int |
| 37 | |
| 38 | firstOnce sync.Once // guards onFirstComplete |
| 39 | onFirstComplete func() // called once after the first tracked inference call completes |
| 40 | } |
| 41 | |
| 42 | // Compile-time contract: *InFlightTrackingClient must implement the FULL backend |
| 43 | // surface. Because it embeds only ControlBackend, this fails to compile if any |
nothing calls this directly
no outgoing calls
no test coverage detected