Skip to content

Adding end-to-end Timeout Functionality#2268

Open
O-sura wants to merge 8 commits into
wso2:mainfrom
O-sura:apip-resiliency
Open

Adding end-to-end Timeout Functionality#2268
O-sura wants to merge 8 commits into
wso2:mainfrom
O-sura:apip-resiliency

Conversation

@O-sura

@O-sura O-sura commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR introduces end-to-end timeout configuration across the gateway data plane. It adds a new resilience block to RestApi resources, allowing API- and operation-level configuration of backend request and idle timeouts, which are translated into Envoy route timeouts and can be disabled with "0s". It also enhances support for upstream connect timeouts and exposes Envoy HTTP Connection Manager (HCM) downstream timeouts through config.toml, enabling protection against slow or stalled clients and backends. The implementation includes CRD and schema updates, validation, deployment transforms, xDS translation, documentation updates, and comprehensive integration and unit test coverage to verify timeout behavior across the stack.

@coderabbitai summary

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The PR adds a resilience timeout schema to REST API, LLM provider, and LLM proxy configurations, updates generated management and operator models, and wires the new fields through validation, transformation, and xDS route translation. It also adds HTTP listener timeout settings and validation for downstream connection manager timeouts, including new defaults in gateway config. Integration scenarios cover backend, LLM, connect, and request-header timeout behavior. gateway/build-manifest.yaml adds a backend-jwt policy entry.

Sequence Diagram(s)

sequenceDiagram
  participant AV as api_validator.go
  participant LV as llm_validator.go
  participant RT as transform/restapi.go
  participant LT as llm_transformer.go
  participant XDS as translator.go
  participant Envoy as RouteAction
  AV->>RT: validated resilience fields
  LV->>LT: validated resilience fields
  RT->>XDS: route timeout data
  LT->>XDS: route resilience on generated operations
  XDS->>Envoy: Timeout / IdleTimeout
Loading
sequenceDiagram
  participant CFG as config.go
  participant VAL as validateTimeoutConfig
  participant CL as createListener
  participant HCM as HttpConnectionManager
  CFG->>VAL: router.http_listener.timeouts
  VAL->>CL: accepted timeout values
  CL->>HCM: RequestTimeout / RequestHeadersTimeout / StreamIdleTimeout / IdleTimeout
Loading

Suggested reviewers

  • RakhithaRR
  • Tharsanan1
  • VirajSalaka
  • tharindu1st
  • malinthaprasan
  • AnuGayan
  • HeshanSudarshana
  • chamilaadhi
  • Arshardh
  • dushaniw
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning Only the Purpose section is provided; required sections like Goals, Approach, tests, security, samples, and test environment are missing. Add the missing template sections with concise details for goals, approach, user stories, documentation, unit/integration tests, security checks, samples, related PRs, and test environment.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and matches the main change: adding end-to-end timeout functionality across the gateway.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@O-sura O-sura force-pushed the apip-resiliency branch from a0081b1 to 464fdee Compare June 25, 2026 06:28
@O-sura O-sura requested a review from PasanT9 as a code owner June 26, 2026 03:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
gateway/gateway-controller/api/management-openapi.yaml (1)

5821-5827: 📐 Maintainability & Code Quality | 🔵 Trivial

Preserve the LLM-specific resilience scope constraint in generated models.

The resilience field for LLMProviderConfigData and LLMProxyConfigData currently lists a generic description in the generated Go code that implies operation-level overrides are supported ("Can be set at the API level ... and/or the operation level"). However, the OpenAPI spec explicitly restricts this to the API level only ("Supported at the API level only").

Because the description is written as a sibling to the $ref, the code generator is likely ignoring it or defaulting to the referenced Resilience schema's generic description. To ensure the specific "API-level only" constraint is preserved in the generated documentation:

Wrap the reference in an allOf block or create a dedicated schema for these fields:

resilience:
  allOf:
    - $ref: '`#/components/schemas/Resilience`'
  description: >
    API-level backend/route timeout configuration. Applies to all routes generated
    for this LLM Provider (the routes that forward traffic upstream). Supported at the
    API level only - LLM routes are synthesized by the gateway, so there is no
    operation-level override.

Apply this pattern to both LLMProviderConfigData (lines 5821-5827) and LLMProxyConfigData (lines 6118-6124).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@gateway/gateway-controller/api/management-openapi.yaml` around lines 5821 -
5827, The generated models are dropping the LLM-specific API-level-only
constraint for the resilience field because the description is attached beside a
$ref and may be overridden by the shared Resilience schema. Update the
resilience definitions in LLMProviderConfigData and LLMProxyConfigData in the
OpenAPI spec to wrap the reference in an allOf block or use a dedicated schema
so the field-level description is preserved. Make sure the generated Go docs
retain the “Supported at the API level only” wording and do not imply
operation-level overrides.
kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml (1)

101-115: 🗄️ Data Integrity & Integration | 🔵 Trivial

Duration pattern is stricter than runtime parser

The CRD regex ^\d+(\.\d+)?(ms|s|m|h)$ enforces single-unit durations (e.g., "30s"), while the controller logic in gateway/gateway-controller/pkg/config/api_validator.go (line 471) uses Go's time.ParseDuration, which natively supports compound forms (e.g., "1h30m"). This mismatch causes the API server to reject valid durations that the runtime would accept.

If the goal is parity with the runtime, update the CRD pattern to match Go's accepted formats. If single units are intentional, document the restriction in the field description.

Also applies to lines 150-164.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml`
around lines 101 - 115, The duration validation for the resilience fields is
stricter in the CRD than in the controller’s runtime parsing, so valid values
accepted by `time.ParseDuration` in `api_validator.go` can be rejected by the
API server. Update the `resilience.idleTimeout` and `resilience.timeout` schema
patterns in the CRD to match Go duration syntax if parity is intended, or
otherwise revise the field descriptions to explicitly state that only
single-unit durations are allowed. Apply the same change to the duplicate
resilience schema block referenced in the other section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@gateway/configs/config-template.toml`:
- Around line 215-225: Clarify the HCM timeout contract by aligning the template
and validation behavior around idle_timeout: update the comments in
config-template.toml to match what validateTimeoutConfig() and
TestConfig_ValidateHCMTimeouts actually accept, or adjust those validation paths
if idle_timeout should not allow 0s. Use the router.http_listener.timeouts
section and the validateTimeoutConfig/TestConfig_ValidateHCMTimeouts symbols to
keep the template text and runtime rules consistent.

In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 457-490: The timeout validation in validateResilienceTimeouts is
more permissive than the CRD pattern, so align it with the admission rule or
document the intentional mismatch. Update validateResilienceTimeouts in
api_validator.go to enforce the same duration format as the CRD regex used for
Resilience.Timeout and Resilience.IdleTimeout, and make sure the downstream
ResolveResilience parsing behavior in translator.go matches that contract so
values like bare 0, compound durations, and negatives are handled consistently.

In `@gateway/it/features/upstream-connect-timeout.feature`:
- Line 50: The upstream connect timeout scenario uses an unreliable target and
the surrounding comment is stale. Update the comment in the feature scenario to
match the actual target used by the test, and adjust the test input in the
connect_timeout case to use a documented blackhole address such as 192.0.2.1 in
the scenario or otherwise guarantee that 10.255.255.1 is silently dropped in the
test environment. Use the scenario text and the connect_timeout setup in
upstream-connect-timeout.feature to keep the timeout assertion stable and avoid
immediate unreachable failures.

---

Nitpick comments:
In `@gateway/gateway-controller/api/management-openapi.yaml`:
- Around line 5821-5827: The generated models are dropping the LLM-specific
API-level-only constraint for the resilience field because the description is
attached beside a $ref and may be overridden by the shared Resilience schema.
Update the resilience definitions in LLMProviderConfigData and
LLMProxyConfigData in the OpenAPI spec to wrap the reference in an allOf block
or use a dedicated schema so the field-level description is preserved. Make sure
the generated Go docs retain the “Supported at the API level only” wording and
do not imply operation-level overrides.

In
`@kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml`:
- Around line 101-115: The duration validation for the resilience fields is
stricter in the CRD than in the controller’s runtime parsing, so valid values
accepted by `time.ParseDuration` in `api_validator.go` can be rejected by the
API server. Update the `resilience.idleTimeout` and `resilience.timeout` schema
patterns in the CRD to match Go duration syntax if parity is intended, or
otherwise revise the field descriptions to explicitly state that only
single-unit durations are allowed. Apply the same change to the duplicate
resilience schema block referenced in the other section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7996541b-429f-42d2-8d91-7c080025f8ab

📥 Commits

Reviewing files that changed from the base of the PR and between 13810b0 and 7ff9ca0.

📒 Files selected for processing (36)
  • gateway/build-manifest.yaml
  • gateway/configs/config-template.toml
  • gateway/gateway-controller/api/management-openapi.yaml
  • gateway/gateway-controller/pkg/api/management/generated.go
  • gateway/gateway-controller/pkg/config/api_validator.go
  • gateway/gateway-controller/pkg/config/api_validator_test.go
  • gateway/gateway-controller/pkg/config/config.go
  • gateway/gateway-controller/pkg/config/config_test.go
  • gateway/gateway-controller/pkg/config/llm_validator.go
  • gateway/gateway-controller/pkg/config/llm_validator_resilience_test.go
  • gateway/gateway-controller/pkg/constants/constants.go
  • gateway/gateway-controller/pkg/models/runtime_deploy_config.go
  • gateway/gateway-controller/pkg/transform/restapi.go
  • gateway/gateway-controller/pkg/transform/restapi_test.go
  • gateway/gateway-controller/pkg/utils/llm_resilience_test.go
  • gateway/gateway-controller/pkg/utils/llm_transformer.go
  • gateway/gateway-controller/pkg/xds/translator.go
  • gateway/gateway-controller/pkg/xds/translator_test.go
  • gateway/it/features/backend-timeout.feature
  • gateway/it/features/llm-backend-timeout.feature
  • gateway/it/features/upstream-connect-timeout.feature
  • gateway/it/steps_backend_timeout.go
  • gateway/it/steps_timeouts.go
  • gateway/it/suite_test.go
  • gateway/it/test-config.toml
  • kubernetes/gateway-operator/api/v1alpha1/llmprovider_types.go
  • kubernetes/gateway-operator/api/v1alpha1/llmproxy_types.go
  • kubernetes/gateway-operator/api/v1alpha1/restapi_types.go
  • kubernetes/gateway-operator/api/v1alpha1/zz_generated.deepcopy.go
  • kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_llmproviders.yaml
  • kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_llmproxies.yaml
  • kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml
  • kubernetes/gateway-operator/config/samples/api_v1_restapi.yaml
  • kubernetes/helm/operator-helm-chart/crds/gateway.api-platform.wso2.com_llmproviders.yaml
  • kubernetes/helm/operator-helm-chart/crds/gateway.api-platform.wso2.com_llmproxies.yaml
  • kubernetes/helm/operator-helm-chart/crds/gateway.api-platform.wso2.com_restapis.yaml
💤 Files with no reviewable changes (1)
  • gateway/it/steps_backend_timeout.go

Comment thread gateway/configs/config-template.toml
Comment thread gateway/gateway-controller/pkg/config/api_validator.go
Comment thread gateway/it/features/upstream-connect-timeout.feature Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant