Skip to content

Consistently retrying 449 client-side across Gateway modes#49332

Merged
FabianMeiswinkel merged 35 commits into
Azure:mainfrom
FabianMeiswinkel:users/fabianm/449Retry
Jun 3, 2026
Merged

Consistently retrying 449 client-side across Gateway modes#49332
FabianMeiswinkel merged 35 commits into
Azure:mainfrom
FabianMeiswinkel:users/fabianm/449Retry

Conversation

@FabianMeiswinkel
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel commented Jun 1, 2026

Description

Summary

Unifies 449 (Retry With) handling across Gateway V1 and Gateway V2/Thin Client so retries are orchestrated client-side.

  • Adds a shared RetryWithRetryPolicy for 449 retry behavior.
  • Adds GatewayRetryWithRetryPolicy to combine gateway metadata retry handling with client-side 449 retries.
  • Sends x-ms-noretry-449 for Gateway V1 requests so the service does not drive the retry loop.
  • Leaves Gateway V2/Thin Client without the Gateway V1-only opt-out header.
  • Applies remaining retry budget and client retry metadata on gateway 449 retry attempts.
  • Treats 449 as an expected retryable response to avoid noisy error-level logs.
  • Reuses the shared 449 policy from direct connectivity’s GoneAndRetryWithRetryPolicy.

Testing

  • Added unit coverage for gateway 449 retry policy behavior.
  • Added Gateway V1 header coverage for x-ms-noretry-449.
  • Updated Gateway V1 and Gateway V2 fault-injection coverage so injected 449 responses are expected to retry client-side and validate the retry attempt uses the client-side retry budget.
  • Updated direct connectivity retry policy tests to use a real RetryWithException.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings June 1, 2026 10:18
@FabianMeiswinkel FabianMeiswinkel requested review from a team and kirankumarkolli as code owners June 1, 2026 10:18
@github-actions github-actions Bot added the Cosmos label Jun 1, 2026
@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Cosmos Gateway request path to handle HTTP 449 (RetryWith) consistently via client-side retry, and adds a gateway header to disable the legacy Gateway V1 server-side 449 retry loop where applicable.

Changes:

  • Add a reusable RetryWithRetryPolicy and a new GatewayRetryWithRetryPolicy that combines 449 RetryWith retry + existing gateway metadata/network retry behavior.
  • Ensure gateway requests send x-ms-noretry-449: true so 449 is retried client-side instead of via Gateway V1 server-side looping (with a ThinClient override).
  • Update/add unit and fault-injection tests to validate 449 retry behavior and diagnostics.
Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java Overrides gateway 449 header behavior for ThinClient mode.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxGatewayStoreModel.java Adds x-ms-noretry-449 header and switches gateway invoke path to new composite retry policy w/ timeout propagation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RetryWithRetryPolicy.java New standalone retry policy implementing backoff + timeout budgeting for 449.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/HttpConstants.java Introduces NO_RETRY_449 header constant.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GatewayRetryWithRetryPolicy.java New composite retry policy for gateway (449 RetryWith + existing metadata/network retries).
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Exceptions.java Treats 449 as “commonly expected” to reduce noisy logs.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/GoneAndRetryWithRetryPolicy.java Refactors to use the new shared RetryWithRetryPolicy implementation.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/RxGatewayStoreModelTest.java Adds unit tests asserting x-ms-noretry-449 header behavior.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/GatewayRetryWithRetryPolicyTest.java New unit tests for the composite gateway retry policy.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/GoneAndRetryWithRetryPolicyTest.java Adjusts RetryWith test setup to match refactor.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/faultinjection/FaultInjectionServerErrorRuleOnGatewayV2Tests.java Updates expectation: 449 is retried; validates retry diagnostics.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/faultinjection/FaultInjectionServerErrorRuleOnGatewayTests.java Updates expectation: 449 is retried; validates retry diagnostics.

Copilot's findings

  • Files reviewed: 13/13 changed files
  • Comments generated: 2

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel FabianMeiswinkel enabled auto-merge (squash) June 3, 2026 02:00
@FabianMeiswinkel FabianMeiswinkel merged commit 85fadfc into Azure:main Jun 3, 2026
139 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants