Summary
We are routing through a proxy for guardrails, and before executing a /search query, we call /search-plan to determine whether the query is too heavy.
We observed that a small number of wide /<index>/search-plan requests can put noticeable CPU and read I/O pressure on the PostgreSQL metastore.
This may be related to how /search-plan currently uses the general-purpose list_splits metastore path. For PostgreSQL, that path appears to fetch full split rows, including split_metadata_json, even though the /search-plan response only needs split identifiers for searched_splits.
Environment
- Quickwit: e86f4fb
- Metastore backend: PostgreSQL (AWS Aurora PostgreSQL)
- Workload: low request rate, but some wide
/<index>/search-plan calls covering broad time ranges
Observed behavior
During the execution of /search-plan, the PostgreSQL Read Replica showed read-heavy load that was disproportionate to the request rate:
- DB CPU reached roughly 85–90%.
- Statement calls stayed in the single-digit calls/sec range.
- Active sessions were not saturated.
- Connection pool usage was low.
- Read throughput peaked around a few hundred MiB/s.
- Read IOPS peaked around tens of thousands per second.
- Returned/fetched tuples spiked sharply.
- Locks were mostly
AccessShareLock; no conflicts or deadlocks were observed.
This suggests the load is not caused by connection saturation, lock contention, or writes, but by a small number of read queries scanning and/or returning many rows.
As a mitigation we tried setting statement_timeout on the metastore connection, but it did not help: the heavy queries do not exceed the timeout, they simply do a lot of per-call work within it.
Suspected cause
/<index>/search-plan goes through the normal split planning path:
quickwit-search/src/root.rs — search_plan
quickwit-search/src/lib.rs — list_relevant_splits
- PostgreSQL metastore —
list_splits
For PostgreSQL, list_splits builds a general-purpose split listing query and returns full split rows, including the large split_metadata_json TEXT column for every matched split. Each row is then JSON-deserialized into SplitMetadata on the Quickwit side.
The /search-plan response, however, only appears to use the split identifier from each split:
let split_ids = split_metadatas
.into_iter()
.map(|split| format!("{}/{}", split.index_uid.index_id, split.split_id))
.collect();
So for N matched splits, each /search-plan call appears to read and JSON-deserialize N full SplitMetadata values only to keep index_id and split_id from each. With wide queries (broad time range or * patterns) matching tens of thousands of splits, this is consistent with the read-heavy load we observed.
/<index>/search shares the same list_splits path. The downstream search code only uses a small number of scalar fields per split (index_uid, split_id, footer_offsets, time_range, num_docs), so a similar over-fetching shape likely applies there as well.
Possible improvement
Would it make sense to add a lightweight split listing path for planning-only use cases?
For /search-plan, the PostgreSQL metastore could avoid the general-purpose list_splits projection and fetch only the required identifiers, for example:
SELECT index_uid, split_id
FROM splits
WHERE ...
Summary
We are routing through a proxy for guardrails, and before executing a
/searchquery, we call/search-planto determine whether the query is too heavy.We observed that a small number of wide
/<index>/search-planrequests can put noticeable CPU and read I/O pressure on the PostgreSQL metastore.This may be related to how
/search-plancurrently uses the general-purposelist_splitsmetastore path. For PostgreSQL, that path appears to fetch full split rows, includingsplit_metadata_json, even though the/search-planresponse only needs split identifiers forsearched_splits.Environment
/<index>/search-plancalls covering broad time rangesObserved behavior
During the execution of
/search-plan, the PostgreSQL Read Replica showed read-heavy load that was disproportionate to the request rate:AccessShareLock; no conflicts or deadlocks were observed.This suggests the load is not caused by connection saturation, lock contention, or writes, but by a small number of read queries scanning and/or returning many rows.
As a mitigation we tried setting
statement_timeouton the metastore connection, but it did not help: the heavy queries do not exceed the timeout, they simply do a lot of per-call work within it.Suspected cause
/<index>/search-plangoes through the normal split planning path:quickwit-search/src/root.rs—search_planquickwit-search/src/lib.rs—list_relevant_splitslist_splitsFor PostgreSQL,
list_splitsbuilds a general-purpose split listing query and returns full split rows, including the largesplit_metadata_jsonTEXT column for every matched split. Each row is then JSON-deserialized intoSplitMetadataon the Quickwit side.The
/search-planresponse, however, only appears to use the split identifier from each split:So for N matched splits, each
/search-plancall appears to read and JSON-deserialize N fullSplitMetadatavalues only to keepindex_idandsplit_idfrom each. With wide queries (broad time range or*patterns) matching tens of thousands of splits, this is consistent with the read-heavy load we observed./<index>/searchshares the samelist_splitspath. The downstream search code only uses a small number of scalar fields per split (index_uid,split_id,footer_offsets,time_range,num_docs), so a similar over-fetching shape likely applies there as well.Possible improvement
Would it make sense to add a lightweight split listing path for planning-only use cases?
For
/search-plan, the PostgreSQL metastore could avoid the general-purposelist_splitsprojection and fetch only the required identifiers, for example: