fix: Optimize TaskQueueDB matching with composite indices and EXISTS#8462
fix: Optimize TaskQueueDB matching with composite indices and EXISTS#8462chrisburr wants to merge 2 commits intoDIRACGrid:integrationfrom
Conversation
The TaskIndex on multi-value tables (tq_TQTo*) previously only covered TQId, requiring a separate lookup for the Value column. Adding Value to the composite index makes it a covering index for all subqueries that filter on both columns, improving query performance significantly.
Replace COUNT-based subqueries with EXISTS/NOT EXISTS patterns in __generateTQMatchSQL, __generateTagSQLSubCond, and __generateTQFindSQL. EXISTS short-circuits on the first matching row instead of scanning all rows, which combined with the composite (TQId, Value) indices reduces matching query time from ~30ms to ~3ms on production.
495abee to
fe366e0
Compare
|
@fstagni At the dops it's probably worth recommending people do this: Even on lbwms this ran in a few hundred milliseconds so it's pretty safe to do even in a production system. |
aldbr
left a comment
There was a problem hiding this comment.
Not an expert of that part of the code but, given the description of the PR, the changes look good to me
| sql2 = sql1 + f" AND {tableName}.Value={tagMatchList}" | ||
| sql = "( " + sql1 + " ) = (" + sql2 + " )" | ||
| valuesStr = tagMatchList | ||
| sql = f"NOT EXISTS ( SELECT 1 FROM {tableName} WHERE {tableName}.TQId=tq.TQId AND {tableName}.Value NOT IN ( {valuesStr} ) )" |
There was a problem hiding this comment.
My favorite llm is telling me that:
When
tagMatchListis empty, the new code accepts ANY task (with zero values), but the old code required tasks to ONLY contain empty string values.Old logic:
(SELECT COUNT(all) WHERE id=tq.TQId) = (SELECT COUNT(...) AND value='')New logic:
NOT EXISTS (... WHERE value != '')
→ Accepts tasks with 0 values OR tasks where all values = ''This changes semantics - might incorrectly accept tasks that should be rejected.
I would say this is OK. Any other ideas?
There was a problem hiding this comment.
I think you need a smarter favorite LLM, I think that comment is claiming that 0 != 0
I will do, but just for the sake of organization, I would:
Like this:
|
| "Fields": {"TQId": "INTEGER(11) UNSIGNED NOT NULL", "Value": "VARCHAR(64) NOT NULL"}, | ||
| "PrimaryKey": ["TQId", "Value"], | ||
| "Indexes": {"TaskIndex": ["TQId"], f"{multiField}Index": ["Value"]}, | ||
| "Indexes": {"TaskIndex": ["TQId", "Value"], f"{multiField}Index": ["Value"]}, |
There was a problem hiding this comment.
The TQId,Value pair is the PRIMARY KEY in this table. So, the corresponding index is anyway created. Why do you want to add it once again explicitly ?
It is probably just better to drop the "TaskIndex": ["TQId"]
There was a problem hiding this comment.
It appears the defintion of the table here is inconsistent with what is actually present for LHCb so maybe this is an LHCb specific problem?
We should probably run:
ALTER TABLE tq_TQToBannedSites ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToGridCEs ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToGridMiddlewares ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToJobTypes ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToLHCbPlatforms ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToPilotTypes ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToPlatforms ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToSites ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
ALTER TABLE tq_TQToTags ADD PRIMARY KEY (TQId, Value), DROP INDEX idx_tqid_value;
The
__generateTQMatchSQLmatching query usesCOUNTsubqueries to check whether multi-value tables (tq_TQTo*) have rows for a given TQId. This is inefficient becauseCOUNTscans all matching rows even when we only need to know if any row exists, and the single-columnTaskIndexonTQIdrequires a separate lookup to fetch theValuecolumn.This PR makes two changes:
Composite
(TQId, Value)indices on the multi-value tables, replacing the single-columnTaskIndexonTQId. This makes the index a covering index for all subqueries that filter on both columns. The existing per-field{Field}IndexonValuealone is kept for reverse lookups (e.g. BannedSites antijoin).EXISTS/NOT EXISTSinstead ofCOUNTin__generateTQMatchSQL,__generateTagSQLSubCond, and__generateTQFindSQL.EXISTSshort-circuits on the first matching row rather than counting all of them.Together these drop
__generateTQMatchSQLquery time from ~30 ms to ~3 ms on a production-sized DB (confirmed viaEXPLAIN ANALYZE).Note: For existing deployments the schema change only takes effect on newly created tables (
__initializeDBskips tables that already exist). Production DBs need a one-offALTER TABLEto add the composite index.BEGINRELEASENOTES
*WorkloadManagement
FIX: Optimize TaskQueueDB matching queries by adding composite (TQId, Value) indices and replacing COUNT subqueries with EXISTS/NOT EXISTS
*Deployment
CHANGE: ALTER TABLE
tq_TQToSitesADD INDEXidx_tqid_value(TQId,Value), DROP INDEX TaskIndex;CHANGE: ALTER TABLE
tq_TQToPlatformsADD INDEXidx_tqid_value(TQId,Value), DROP INDEX TaskIndex;CHANGE: ALTER TABLE
tq_TQToJobTypesADD INDEXidx_tqid_value(TQId,Value), DROP INDEX TaskIndex;CHANGE: ALTER TABLE
tq_TQToBannedSitesADD INDEXidx_tqid_value(TQId,Value), DROP INDEX TaskIndex;CHANGE: ALTER TABLE
tq_TQToGridCEsADD INDEXidx_tqid_value(TQId,Value), DROP INDEX TaskIndex;CHANGE: ALTER TABLE
tq_TQToTagsADD INDEXidx_tqid_value(TQId,Value), DROP INDEX TaskIndex;ENDRELEASENOTES