feat: add health and readiness check endpoints to CLI serve API#1100
feat: add health and readiness check endpoints to CLI serve API#1100markstur wants to merge 2 commits into
Conversation
Adds /health and /ready endpoints to the FastAPI server for Kubernetes liveness and readiness probes. - /health: always returns 200 (liveness check) - /ready: returns 200 when ready, 503 otherwise (readiness check) The readiness check is basic check that run_server happened which provides the chat endpoint. In the future this could be extended to let serve modules report readiness of their backends (etc, needs some design). Would also need to be adapted appropriately when we add support for multiple serve modules. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob
|
Did a little bit of import cleanup (unused or moved to top) which was not related to the PR but I think is good to do while touching the files vs the overhead of a nit PR. I'm open to splitting if preferred. |
| if _server_ready: | ||
| return {"status": "ready"} | ||
| else: | ||
| raise HTTPException(status_code=503, detail="Server not ready") |
There was a problem hiding this comment.
Should this be an exception? I don't think anything on our side is actually causing failures. I think it should just be a response with the 503 status code and same detail / status message.
There was a problem hiding this comment.
uvicorn handles this the way you want it to
| @@ -295,5 +334,9 @@ def run_server( | |||
| methods=["POST"], | |||
| response_model=ChatCompletion | OpenAIErrorResponse, | |||
| ) | |||
|
|
|||
| # Mark server as ready after route is successfully registered | |||
| _server_ready = True | |||
|
|
|||
| typer.echo(f"Serving {route_path} at http://{host}:{port}") | |||
| uvicorn.run(app, host=host, port=port) | |||
There was a problem hiding this comment.
Is it possible to even hit the readiness endpoint before the server is ready since the fastapi app isn't run with uvicorn until after the module is loaded and the route added?
There was a problem hiding this comment.
Good point. I will remove /ready for now. That's what I get for sneaking in a bit extra.
FYI --
I cannot hit the not-ready state with the expected run_server() use and since I put _server_ready=True in run_server() it's not very usable any other way right now. I didn't realize that even a breakpoint() would not make this work.
Technically, however, I can run the server w/o using run_server and I get healthy but not ready:
uv run python -c 'from cli.serve.app import app; import uvicorn; uvicorn.run(app, host="0.0.0.0", port="8080")' <-- but don't care about that right now.
I will add an issue to implement /ready later. It should:
1 - handle shutdown signals for k8s!
2 - be for future use (speculative, but if we want to allow module(s) to have startup or other ready dependencies)
* changed status value to "pass" because that is IETF standard. k8s doesn't care what the string is. * removing the /ready implementation which is not ready and not part of this issue Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
NOTE: I also changed the status return json to {"status": "pass"} because I learned that is IETF standard while k8s doesn't care what the string is. pass, ok, healthy are common, but pass s most standard. |
Pull Request
Issue
Fixes #1066
Description
Adds /health and /ready endpoints to the FastAPI server for Kubernetes liveness and readiness probes.
The readiness check is basic check that run_server happened which provides the chat endpoint. In the future this could be extended to let serve modules report readiness of their backends (etc, needs some design). Would also need to be adapted appropriately when we add support for multiple serve modules.
Assisted-by: IBM Bob
Testing
Attribution
Adding a new component, requirement, sampling strategy, or tool?
If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.
NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.