[pip] PIP-459: Batch Status Summary and Filtered Listing for Pulsar Functions by onceMisery · Pull Request #25299 · apache/pulsar

onceMisery · 2026-03-09T03:17:25Z

Main Issue: #25235

Motivation

See pip/pip-459.md for the full proposal.

Modifications

Added PIP-459 proposal document.

Verifying this change

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

matching-pr

lhotari · 2026-03-09T08:09:16Z

pip/pip-459.md

+The endpoint accepts two optional query parameters:
+
+- `limit`: maximum number of functions to return; must be greater than `0` when present
+- `continuationToken`: exclusive cursor based on function name in lexicographical order


How would this continuationToken work and what would be the format? The Pulsar Functions Admin API is stateless and subsequent calls could go to different Function Worker instances in a cluster with multiple workers.

Hi @onceMisery,
#25299 (comment)

This design makes sense to me. It can reduce the number of requests needed to fetch function stats. The only trade-off is that each worker needs to first list the functions under the namespace and sort them lexicographically.

Should we rename this parameter to something more explicit, such as startAfterFunctionName?

We should also clearly document the pagination behavior in the API, similar to how you described it in your comment.

lhotari · 2026-03-09T08:16:02Z

pip/pip-459.md

+The worker-side implementation still computes summaries by querying per-function status and then aggregating results. Controlled parallelism improves latency, but it does not fundamentally change the cost model. Future work may explore more direct or cached aggregation paths if very large namespaces make the endpoint hot.
+
+# Links
+


Add the discussion link here, https://lists.apache.org/thread/k792434hgm237qmtbn4fsd62hdzdt0h7

Suggested change

* Mailing List discussion thread: https://lists.apache.org/thread/k792434hgm237qmtbn4fsd62hdzdt0h7

* Mailing List voting thread: TBD

Thanks for raising this concern!

You're absolutely right that the Admin API is stateless and requests can hit
different workers. My design addresses this through deterministic, name-based
cursor pagination:

How it works:

continuationToken is simply the last function name from the previous page

Each worker independently sorts all function names lexicographically

The token acts as an exclusive lower bound to find the next page

No server-side state is required; the token is self-contained

This approach works consistently across workers because the sorting is
deterministic and metadata is replicated.

Trade-off:
If functions are created/deleted during pagination, clients may see duplicates
or miss entries. This is a common limitation of stateless cursor-based pagination
in distributed systems (similar to Kubernetes list pagination or DynamoDB queries).

One improvement I should make:
I also agree the API should return nextContinuationToken explicitly rather than only a bare list, so I can update the response shape to make the paging contract clear.

{ "summaries": [...], "nextContinuationToken": "func-xyz" // or null }

Would this design work for you? I can update the PR accordingly.

onceMisery · 2026-03-09T14:26:11Z

Thanks for raising this concern! You're absolutely right that the Admin API is stateless and requests can hit different workers. My design addresses this through deterministic, name-based cursor pagination: How it works: continuationToken is simply the last function name from the previous page Each worker independently sorts all function names lexicographically The token acts as an exclusive lower bound to find the next page No server-side state is required; the token is self-contained This approach works consistently across workers because the sorting is deterministic and metadata is replicated. Trade-off: If functions are created/deleted during pagination, clients may see duplicates or miss entries. This is a common limitation of stateless cursor-based pagination in distributed systems (similar to Kubernetes list pagination or DynamoDB queries). One improvement I should make: Currently the API returns List<FunctionStatusSummary>. I should wrap this in a response object with nextContinuationToken (null when no more data), so clients can easily detect the end of pagination: { "summaries": [...], "nextContinuationToken": "func-xyz" // or null } The name continuationToken might be ambiguous. What do you think of startAfter? for example: public List<FunctionStatusSummary> getFunctionsWithStatus( String tenant, String namespace, Integer limit, String startAfter) Would this design work for you? I can update the PR accordingly. Message ID: ***@***.***>

shibd · 2026-03-12T12:21:03Z

pip/pip-459.md

+    private int numInstances;
+    private int numRunning;
+    private String error;
+    private ErrorType errorType;


hi, @onceMisery Thanks for PIP, Have we considered including a bit more information in this response, for example:

- `receivedTotal` - `processedSuccessfullyTotal` - `systemExceptionsTotal` - `userExceptionsTotal` - `avgProcessLatency` - `userMetrics`

These values are already aggregate, and they could give operators a more direct view of a function's actual health instead of relying only on RUNNING / STOPPED / PARTIAL / UNKNOWN.

Another option would be to keep the default response lightweight, but add a query parameter to control the level of detail returned by the REST API. That would let us preserve the current "summary" use case while still supporting a more diagnostic view when needed.

I think this is a great suggestion. @shibd

shibd · 2026-03-12T12:31:18Z

pip/pip-459.md

+The endpoint accepts two optional query parameters:
+
+- `limit`: maximum number of functions to return; must be greater than `0` when present
+- `continuationToken`: exclusive cursor based on function name in lexicographical order


Hi @onceMisery,
#25299 (comment)

This design makes sense to me. It can reduce the number of requests needed to fetch function stats. The only trade-off is that each worker needs to first list the functions under the namespace and sort them lexicographically.

Should we rename this parameter to something more explicit, such as startAfterFunctionName?

We should also clearly document the pagination behavior in the API, similar to how you described it in your comment.

onceMisery · 2026-03-12T13:28:04Z

@shibd @lhotari
I fully agree with this suggestion:

The continuationToken should be renamed to startAfterFunctionName.
add a query parameter to control the level of detail returned by the REST API. That would let us preserve the current "summary" use case while still supporting a more diagnostic view when needed.

fagao added 2 commits March 9, 2026 10:40

[pip]-Batch Status Summary and Filtered Listing for Pulsar Functions

7774cf0

[pip]-rename file to pip-459.md

48f9c80

github-actions bot added PIP doc-not-needed Your PR changes do not impact docs labels Mar 9, 2026

onceMisery changed the title ~~PIP:Batch Status Summary and Filtered Listing for Pulsar Functions~~ [pip]:Batch Status Summary and Filtered Listing for Pulsar Functions Mar 9, 2026

lhotari reviewed Mar 9, 2026

View reviewed changes

lhotari changed the title ~~[pip]:Batch Status Summary and Filtered Listing for Pulsar Functions~~ [pip] PIP-459: Batch Status Summary and Filtered Listing for Pulsar Functions Mar 9, 2026

lhotari reviewed Mar 9, 2026

View reviewed changes

Update pip-459.md

8d42183

onceMisery requested a review from lhotari March 11, 2026 08:45

shibd reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pip] PIP-459: Batch Status Summary and Filtered Listing for Pulsar Functions#25299

[pip] PIP-459: Batch Status Summary and Filtered Listing for Pulsar Functions#25299
onceMisery wants to merge 3 commits intoapache:masterfrom
onceMisery:pip/issues/20235-pulsar-admin

onceMisery commented Mar 9, 2026

Uh oh!

lhotari Mar 9, 2026

Uh oh!

shibd Mar 12, 2026

Uh oh!

lhotari Mar 9, 2026

Uh oh!

onceMisery Mar 9, 2026 •

edited

Loading

Uh oh!

onceMisery commented Mar 9, 2026 via email

Uh oh!

shibd Mar 12, 2026

Uh oh!

onceMisery Mar 12, 2026

Uh oh!

shibd Mar 12, 2026

Uh oh!

onceMisery commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		The worker-side implementation still computes summaries by querying per-function status and then aggregating results. Controlled parallelism improves latency, but it does not fundamentally change the cost model. Future work may explore more direct or cached aggregation paths if very large namespaces make the endpoint hot.

		# Links


	* Mailing List discussion thread: https://lists.apache.org/thread/k792434hgm237qmtbn4fsd62hdzdt0h7
	* Mailing List voting thread: TBD

Conversation

onceMisery commented Mar 9, 2026

Motivation

Modifications

Verifying this change

Documentation

Matching PR in forked repository

Uh oh!

lhotari Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

shibd Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

lhotari Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

onceMisery Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

onceMisery commented Mar 9, 2026 via email

Uh oh!

shibd Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

onceMisery Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

shibd Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

onceMisery commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

onceMisery Mar 9, 2026 •

edited

Loading