Palo Alto Cortex XDR CCP 3.0.5 & Cortex XDR 3.0.2 - fix PaloAltoCortexXDR_Alerts_CL offset-pagination loop#14384
Open
v-krishnachi wants to merge 1 commit into
Conversation
… loop Fixes runaway offset-pagination loop on PaloAltoCortexXDR_Alerts_CL that produced HTTP 429 throttling from the Cortex XDR API. Root cause: the CortexXdrAlerts RestApiPoller declared pagingType=Offset with JSONPaths $.request_data.search_from / $.request_data.search_to, but queryParametersTemplate did not contain those keys, so the CCP runtime's per-page offset writes silently no-op'd and every page request was identical. Combined with sort.keyword=desc on creation_time (non-deterministic offset paging under load) the runtime entered an unbounded paging loop, never advanced _QueryWindowStartTime, and re-emitted the same fixed creation_time on every request. Fix: add 'search_from':0, 'search_to':100 to the Alerts queryParametersTemplate (matches PAN's documented defaults and 100-row page-size cap) and switch sort.keyword to 'asc' so offset pagination is deterministic. Scope is the CortexXdrAlerts stream only; the other four streams (Agent / Endpoints / Incidents / Management) are byte-for-byte unchanged. Same fix applied to the legacy Cortex XDR solution (3.0.1 -> 3.0.2). Solution version strings bumped in mainTemplate.json (_solutionVersion and contentPackages.properties.version), Data/Solution_CortexXDR.json, and ReleaseNotes.md; package zips regenerated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Palo Alto Cortex XDR CCP 3.0.5 & Cortex XDR 3.0.2 — fix
PaloAltoCortexXDR_Alerts_CLoffset-pagination loopSolutions affected:
Palo Alto Cortex XDR CCP(3.0.4 → 3.0.5) and the legacyCortex XDR(3.0.1 → 3.0.2) — both ship the sameCortexXdrAlertsRestApiPollerresource and therefore both carry the same defect.ICM (internal): 2605070040011703
Type: Bug fix (request-body correction + sort-order correction)
Risk: Low — change is scoped to the
CortexXdrAlertsRestApiPollerresource only; the other four streams (Agent/Endpoints/Incidents/Management) are byte-for-byte unchanged.Symptom
A customer running the Palo Alto Cortex XDR CCP solution stopped ingesting any rows into
PaloAltoCortexXDR_Alerts_CL, while the other four Cortex XDR streams (Agent/Endpoints/Incidents/Management) continued ingesting normally. Palo Alto's API-gateway telemetry showed the customer's API key emitting ~120–200 POST requests/minute againstPOST /public_api/v1/alerts/get_alerts, all using the same fixedcreation_timestart value (~2025-05-05), sustained across many minutes. Palo Alto rate-limited the key with HTTP 429 and asked Microsoft to reduce call volume.Disabling and re-enabling the connector through the Sentinel portal twice did not help — the freshly recreated connector immediately re-entered the same loop against the still-throttled key.
Root cause
The
CortexXdrAlertsrule inPollingConfig.jsondeclares an offset-paging block:…but the request body it points at did not contain those two keys:
How CCP
RestApiPolleroffset paging actually works in the runtime:{_QueryWindowStartTime}/{_QueryWindowEndTime}from its internal cursor state, then issues page 1.eventsJsonPaths($.reply.alerts). If the page is "full" (count ≥pageSize), it writes the next offsets to the JSONPaths configured inoffsetParaName/offsetEndParaNameand issues the next page.pageSizerows under the configured JSONPath.Because
$.request_data.search_fromand$.request_data.search_towere not present in the body, the runtime's per-page writes silently no-op'd. Every page request therefore went out identical to the previous one. Combined withsort.keyword = "desc"oncreation_time(which makes XDR's offset pagination non-deterministic under load — new alerts arriving mid-poll push existing alerts down by one, breaking offset semantics), the API kept returning full pages indefinitely, the runtime kept asking for more, and a single 5-minute poll cycle entered an unbounded loop.Because the loop never produced a "Success" cycle,
_QueryWindowStartTimenever advanced — every subsequent poll re-attempted the same frozen window. That is why Palo Alto's SRE team saw a fixedcreation_timedate repeated on every request.Why only Alerts was affected:
Endpointshas no paging block, andAgent/Incidents/Managementuse the same paging block but their data volume per 5-minute window is virtually always belowpageSize = 100, so the runtime exits on the very first page and the bug never triggers. The Alerts stream is the only high-volume event-style stream in this connector.Reference documentation
POST /public_api/v1/alerts/get_alerts): definesrequest_data.search_from(start offset, default 0) andrequest_data.search_to(end offset, default 100, max page size 100). Documentssortas{ field, keyword }with defaultcreation_time, desc, bothasc/descvalid. https://docs-cortex.paloaltonetworks.com/r/Cortex-XDR-REST-API/Get-all-AlertspagingType: Offset,offsetParaName/offsetEndParaNameJSONPaths must resolve to properties that already exist inqueryParametersTemplate, otherwise the runtime cannot advance pages. https://learn.microsoft.com/azure/sentinel/create-codeless-connectorRestApiPollerDataConnector.properties.paging: https://learn.microsoft.com/rest/api/securityinsights/data-connectors/create-or-updateFix
Two surgical changes, applied only to the
CortexXdrAlertsRestApiPollerresource in both solutions:'search_from': 0, 'search_to': 100to the AlertsqueryParametersTemplaterequest body. The literal values are the seed values the runtime uses for page 1 (they match PAN's documented defaults —search_from = 0zero-based,search_to = 100matches PAN's hard 100-row page-size cap). On subsequent pages the runtime now has real JSON nodes to overwrite viaoffsetParaName/offsetEndParaName, so pagination advances correctly and exits on a short final page.sort.keywordforcreation_timefromdesc→asc. PAN explicitly allows the override, and offset pagination is only deterministic with ascending sort on the same field used infilters. New alerts created mid-poll then append at the end of the result set instead of shifting every existing offset down by one.Expected steady-state call rate for the Alerts stream drops from ~120–200 req/min (loop) to approximately
ceil(N / 100)POSTs per 5-minute window — typically 1–3 POSTs per cycle, i.e. ≈ 0.2–0.6 req/min, well inside PAN's documented limits.Alignment with the Palo Alto contract
POST /public_api/v1/alerts/get_alerts, body rootrequest_datarequest_data.filters(AND-joined)creation_time gteANDcreation_time lte)request_data.search_frominteger, zero-based, default 00request_data.search_tointeger, default 100, maximum result set size 100100(search_to − search_from) ≤ 100request_data.sort(asc/descboth valid; defaultdesc)asc(documented capability)valueis Unix-millis for time fieldsqueryTimeFormat: UnixTimestampInMillsreply.alerts[]eventsJsonPaths: ["$.reply.alerts"]Files changed (10)
Palo Alto Cortex XDR CCPsolution (3.0.4 → 3.0.5)Solutions/Palo Alto Cortex XDR CCP/Data Connectors/CortexXDR_ccp/PollingConfig.json— Alerts request body + sort.Solutions/Palo Alto Cortex XDR CCP/Package/mainTemplate.json— same fix on the embeddedCortexXdrAlertsresource;_solutionVersion(line 48) andcontentPackages.properties.version(line 2675) bumped to3.0.5.Solutions/Palo Alto Cortex XDR CCP/Data/Solution_CortexXDR.json—Versionbumped to3.0.5.Solutions/Palo Alto Cortex XDR CCP/ReleaseNotes.md— new3.0.5row.Solutions/Palo Alto Cortex XDR CCP/Package/3.0.5.zip— regenerated solution package.Legacy
Cortex XDRsolution (3.0.1 → 3.0.2)Solutions/Cortex XDR/Data Connectors/CortexXDR_ccp/PollingConfig.json— Alerts request body + sort.Solutions/Cortex XDR/Package/mainTemplate.json— same fix on the embeddedCortexXdrAlertsresource;_solutionVersion(line 48) andcontentPackages.properties.version(line 3747) bumped to3.0.2.Solutions/Cortex XDR/Data/Solution_CortexXDR.json—Versionbumped to3.0.2(and aligned with mainTemplate; was a pre-existing3.0.0drift).Solutions/Cortex XDR/ReleaseNotes.md— new3.0.2row.Solutions/Cortex XDR/Package/3.0.2.zip— regenerated solution package.Regression analysis
pageSize, so they exit on the first page and the same bug never triggered for them. Touching them would only expand the regression surface for no benefit.TimeGeneratedmapping forPaloAltoCortexXDR_Alerts_CL: unchanged. Switching sort fromdesc→ascchanges only the order rows are paged in within a window, not which rows are ingested.PUTof theCortexXdrAlerts<GUID>resource. The server-managednextQueryWindowStartTimecursor is not in the request body schema and is preserved, so existing customers resume ingestion from their current cursor (no data loss, no duplicate ingestion).Validation
End-to-end resolution confirmed on the ICM customer's workspace after deploying the fix:
PaloAltoCortexXDR_Alerts_CLingesting normally.creation_time gtein successive XDR requests advancing every 5-minute cycle (the previously frozen2025-05-05value is unstuck)./public_api/v1/alerts/get_alertsfrom the customer's key dropped from ~120–200 to single-digit req/min.Notes for reviewers
sort.keywordis intentional and required —descwith offset paging on a continuously growing dataset is non-deterministic and is the underlying reason the loop never terminated even thoughsearch_from/search_towould not have been writable. Both fixes are needed together; either one alone would not fully resolve the symptom.'search_from': 0, 'search_to': 100are seed values for page 1. The CCP runtime overwrites both nodes for subsequent pages using thepaging.offsetParaName/paging.offsetEndParaNameJSONPaths — no further per-page configuration is needed.mainTemplate.json(variables._solutionVersionandcontentPackages.properties.version),Data/Solution_CortexXDR.json(Version), andReleaseNotes.md. The legacy Cortex XDR solution'sData/Solution_CortexXDR.jsonhad a pre-existingVersion: 3.0.0while its mainTemplate was already at3.0.1; this PR aligns both to3.0.2.