Skip to content

/search-plan may over-fetch split metadata from PostgreSQL metastore on wide queries #6430

@earlbread

Description

@earlbread

Summary

We are routing through a proxy for guardrails, and before executing a /search query, we call /search-plan to determine whether the query is too heavy.

We observed that a small number of wide /<index>/search-plan requests can put noticeable CPU and read I/O pressure on the PostgreSQL metastore.

This may be related to how /search-plan currently uses the general-purpose list_splits metastore path. For PostgreSQL, that path appears to fetch full split rows, including split_metadata_json, even though the /search-plan response only needs split identifiers for searched_splits.

Environment

  • Quickwit: e86f4fb
  • Metastore backend: PostgreSQL (AWS Aurora PostgreSQL)
  • Workload: low request rate, but some wide /<index>/search-plan calls covering broad time ranges

Observed behavior

During the execution of /search-plan, the PostgreSQL Read Replica showed read-heavy load that was disproportionate to the request rate:

  • DB CPU reached roughly 85–90%.
  • Statement calls stayed in the single-digit calls/sec range.
  • Active sessions were not saturated.
  • Connection pool usage was low.
  • Read throughput peaked around a few hundred MiB/s.
  • Read IOPS peaked around tens of thousands per second.
  • Returned/fetched tuples spiked sharply.
  • Locks were mostly AccessShareLock; no conflicts or deadlocks were observed.

This suggests the load is not caused by connection saturation, lock contention, or writes, but by a small number of read queries scanning and/or returning many rows.

As a mitigation we tried setting statement_timeout on the metastore connection, but it did not help: the heavy queries do not exceed the timeout, they simply do a lot of per-call work within it.

Image Image Image

Suspected cause

/<index>/search-plan goes through the normal split planning path:

  • quickwit-search/src/root.rssearch_plan
  • quickwit-search/src/lib.rslist_relevant_splits
  • PostgreSQL metastore — list_splits

For PostgreSQL, list_splits builds a general-purpose split listing query and returns full split rows, including the large split_metadata_json TEXT column for every matched split. Each row is then JSON-deserialized into SplitMetadata on the Quickwit side.

The /search-plan response, however, only appears to use the split identifier from each split:

let split_ids = split_metadatas
    .into_iter()
    .map(|split| format!("{}/{}", split.index_uid.index_id, split.split_id))
    .collect();

So for N matched splits, each /search-plan call appears to read and JSON-deserialize N full SplitMetadata values only to keep index_id and split_id from each. With wide queries (broad time range or * patterns) matching tens of thousands of splits, this is consistent with the read-heavy load we observed.

/<index>/search shares the same list_splits path. The downstream search code only uses a small number of scalar fields per split (index_uid, split_id, footer_offsets, time_range, num_docs), so a similar over-fetching shape likely applies there as well.

Possible improvement

Would it make sense to add a lightweight split listing path for planning-only use cases?

For /search-plan, the PostgreSQL metastore could avoid the general-purpose list_splits projection and fetch only the required identifiers, for example:

SELECT index_uid, split_id
FROM splits
WHERE ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions