Skip to content

feat: add health and readiness check endpoints to CLI serve API#1100

Open
markstur wants to merge 2 commits into
generative-computing:mainfrom
markstur:health
Open

feat: add health and readiness check endpoints to CLI serve API#1100
markstur wants to merge 2 commits into
generative-computing:mainfrom
markstur:health

Conversation

@markstur
Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #1066

Description

Adds /health and /ready endpoints to the FastAPI server for Kubernetes liveness and readiness probes.

  • /health: always returns 200 (liveness check)
  • /ready: returns 200 when ready, 503 otherwise (readiness check)

The readiness check is basic check that run_server happened which provides the chat endpoint. In the future this could be extended to let serve modules report readiness of their backends (etc, needs some design). Would also need to be adapted appropriately when we add support for multiple serve modules.

Assisted-by: IBM Bob

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Adds /health and /ready endpoints to the FastAPI server for Kubernetes liveness and readiness probes.

- /health: always returns 200 (liveness check)
- /ready: returns 200 when ready, 503 otherwise (readiness check)

The readiness check is basic check that run_server happened which provides the
chat endpoint.  In the future this could be extended to let serve modules report
readiness of their backends (etc, needs some design). Would also need to be adapted
appropriately when we add support for multiple serve modules.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
@markstur markstur requested a review from a team as a code owner May 19, 2026 19:17
@markstur markstur requested review from jakelorocco and planetf1 May 19, 2026 19:17
@github-actions github-actions Bot added the enhancement New feature or request label May 19, 2026
@markstur
Copy link
Copy Markdown
Contributor Author

Did a little bit of import cleanup (unused or moved to top) which was not related to the PR but I think is good to do while touching the files vs the overhead of a nit PR. I'm open to splitting if preferred.

Comment thread cli/serve/app.py Outdated
if _server_ready:
return {"status": "ready"}
else:
raise HTTPException(status_code=503, detail="Server not ready")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an exception? I don't think anything on our side is actually causing failures. I think it should just be a response with the 503 status code and same detail / status message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uvicorn handles this the way you want it to

Comment thread cli/serve/app.py Outdated
Comment on lines 326 to 342
@@ -295,5 +334,9 @@ def run_server(
methods=["POST"],
response_model=ChatCompletion | OpenAIErrorResponse,
)

# Mark server as ready after route is successfully registered
_server_ready = True

typer.echo(f"Serving {route_path} at http://{host}:{port}")
uvicorn.run(app, host=host, port=port)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to even hit the readiness endpoint before the server is ready since the fastapi app isn't run with uvicorn until after the module is loaded and the route added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will remove /ready for now. That's what I get for sneaking in a bit extra.

FYI --

I cannot hit the not-ready state with the expected run_server() use and since I put _server_ready=True in run_server() it's not very usable any other way right now. I didn't realize that even a breakpoint() would not make this work.

Technically, however, I can run the server w/o using run_server and I get healthy but not ready:
uv run python -c 'from cli.serve.app import app; import uvicorn; uvicorn.run(app, host="0.0.0.0", port="8080")' <-- but don't care about that right now.

I will add an issue to implement /ready later. It should:

1 - handle shutdown signals for k8s!
2 - be for future use (speculative, but if we want to allow module(s) to have startup or other ready dependencies)

* changed status value to "pass" because that is IETF standard.  k8s doesn't care what the string is.
* removing the /ready implementation which is not ready and not part of this issue

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur
Copy link
Copy Markdown
Contributor Author

NOTE: I also changed the status return json to {"status": "pass"} because I learned that is IETF standard while k8s doesn't care what the string is. pass, ok, healthy are common, but pass s most standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli/serve Health check endpoint. GET /health returning {"status": "ok"}.

2 participants