Adding end-to-end Timeout Functionality#2268
Conversation
📝 WalkthroughWalkthroughThe PR adds a Sequence Diagram(s)sequenceDiagram
participant AV as api_validator.go
participant LV as llm_validator.go
participant RT as transform/restapi.go
participant LT as llm_transformer.go
participant XDS as translator.go
participant Envoy as RouteAction
AV->>RT: validated resilience fields
LV->>LT: validated resilience fields
RT->>XDS: route timeout data
LT->>XDS: route resilience on generated operations
XDS->>Envoy: Timeout / IdleTimeout
sequenceDiagram
participant CFG as config.go
participant VAL as validateTimeoutConfig
participant CL as createListener
participant HCM as HttpConnectionManager
CFG->>VAL: router.http_listener.timeouts
VAL->>CL: accepted timeout values
CL->>HCM: RequestTimeout / RequestHeadersTimeout / StreamIdleTimeout / IdleTimeout
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
gateway/gateway-controller/api/management-openapi.yaml (1)
5821-5827: 📐 Maintainability & Code Quality | 🔵 TrivialPreserve the LLM-specific
resiliencescope constraint in generated models.The
resiliencefield forLLMProviderConfigDataandLLMProxyConfigDatacurrently lists a generic description in the generated Go code that implies operation-level overrides are supported ("Can be set at the API level ... and/or the operation level"). However, the OpenAPI spec explicitly restricts this to the API level only ("Supported at the API level only").Because the description is written as a sibling to the
$ref, the code generator is likely ignoring it or defaulting to the referencedResilienceschema's generic description. To ensure the specific "API-level only" constraint is preserved in the generated documentation:Wrap the reference in an
allOfblock or create a dedicated schema for these fields:resilience: allOf: - $ref: '`#/components/schemas/Resilience`' description: > API-level backend/route timeout configuration. Applies to all routes generated for this LLM Provider (the routes that forward traffic upstream). Supported at the API level only - LLM routes are synthesized by the gateway, so there is no operation-level override.Apply this pattern to both
LLMProviderConfigData(lines 5821-5827) andLLMProxyConfigData(lines 6118-6124).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/gateway-controller/api/management-openapi.yaml` around lines 5821 - 5827, The generated models are dropping the LLM-specific API-level-only constraint for the resilience field because the description is attached beside a $ref and may be overridden by the shared Resilience schema. Update the resilience definitions in LLMProviderConfigData and LLMProxyConfigData in the OpenAPI spec to wrap the reference in an allOf block or use a dedicated schema so the field-level description is preserved. Make sure the generated Go docs retain the “Supported at the API level only” wording and do not imply operation-level overrides.kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml (1)
101-115: 🗄️ Data Integrity & Integration | 🔵 TrivialDuration pattern is stricter than runtime parser
The CRD regex
^\d+(\.\d+)?(ms|s|m|h)$enforces single-unit durations (e.g., "30s"), while the controller logic ingateway/gateway-controller/pkg/config/api_validator.go(line 471) uses Go'stime.ParseDuration, which natively supports compound forms (e.g., "1h30m"). This mismatch causes the API server to reject valid durations that the runtime would accept.If the goal is parity with the runtime, update the CRD pattern to match Go's accepted formats. If single units are intentional, document the restriction in the field description.
Also applies to lines 150-164.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml` around lines 101 - 115, The duration validation for the resilience fields is stricter in the CRD than in the controller’s runtime parsing, so valid values accepted by `time.ParseDuration` in `api_validator.go` can be rejected by the API server. Update the `resilience.idleTimeout` and `resilience.timeout` schema patterns in the CRD to match Go duration syntax if parity is intended, or otherwise revise the field descriptions to explicitly state that only single-unit durations are allowed. Apply the same change to the duplicate resilience schema block referenced in the other section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/configs/config-template.toml`:
- Around line 215-225: Clarify the HCM timeout contract by aligning the template
and validation behavior around idle_timeout: update the comments in
config-template.toml to match what validateTimeoutConfig() and
TestConfig_ValidateHCMTimeouts actually accept, or adjust those validation paths
if idle_timeout should not allow 0s. Use the router.http_listener.timeouts
section and the validateTimeoutConfig/TestConfig_ValidateHCMTimeouts symbols to
keep the template text and runtime rules consistent.
In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 457-490: The timeout validation in validateResilienceTimeouts is
more permissive than the CRD pattern, so align it with the admission rule or
document the intentional mismatch. Update validateResilienceTimeouts in
api_validator.go to enforce the same duration format as the CRD regex used for
Resilience.Timeout and Resilience.IdleTimeout, and make sure the downstream
ResolveResilience parsing behavior in translator.go matches that contract so
values like bare 0, compound durations, and negatives are handled consistently.
In `@gateway/it/features/upstream-connect-timeout.feature`:
- Line 50: The upstream connect timeout scenario uses an unreliable target and
the surrounding comment is stale. Update the comment in the feature scenario to
match the actual target used by the test, and adjust the test input in the
connect_timeout case to use a documented blackhole address such as 192.0.2.1 in
the scenario or otherwise guarantee that 10.255.255.1 is silently dropped in the
test environment. Use the scenario text and the connect_timeout setup in
upstream-connect-timeout.feature to keep the timeout assertion stable and avoid
immediate unreachable failures.
---
Nitpick comments:
In `@gateway/gateway-controller/api/management-openapi.yaml`:
- Around line 5821-5827: The generated models are dropping the LLM-specific
API-level-only constraint for the resilience field because the description is
attached beside a $ref and may be overridden by the shared Resilience schema.
Update the resilience definitions in LLMProviderConfigData and
LLMProxyConfigData in the OpenAPI spec to wrap the reference in an allOf block
or use a dedicated schema so the field-level description is preserved. Make sure
the generated Go docs retain the “Supported at the API level only” wording and
do not imply operation-level overrides.
In
`@kubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yaml`:
- Around line 101-115: The duration validation for the resilience fields is
stricter in the CRD than in the controller’s runtime parsing, so valid values
accepted by `time.ParseDuration` in `api_validator.go` can be rejected by the
API server. Update the `resilience.idleTimeout` and `resilience.timeout` schema
patterns in the CRD to match Go duration syntax if parity is intended, or
otherwise revise the field descriptions to explicitly state that only
single-unit durations are allowed. Apply the same change to the duplicate
resilience schema block referenced in the other section.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7996541b-429f-42d2-8d91-7c080025f8ab
📒 Files selected for processing (36)
gateway/build-manifest.yamlgateway/configs/config-template.tomlgateway/gateway-controller/api/management-openapi.yamlgateway/gateway-controller/pkg/api/management/generated.gogateway/gateway-controller/pkg/config/api_validator.gogateway/gateway-controller/pkg/config/api_validator_test.gogateway/gateway-controller/pkg/config/config.gogateway/gateway-controller/pkg/config/config_test.gogateway/gateway-controller/pkg/config/llm_validator.gogateway/gateway-controller/pkg/config/llm_validator_resilience_test.gogateway/gateway-controller/pkg/constants/constants.gogateway/gateway-controller/pkg/models/runtime_deploy_config.gogateway/gateway-controller/pkg/transform/restapi.gogateway/gateway-controller/pkg/transform/restapi_test.gogateway/gateway-controller/pkg/utils/llm_resilience_test.gogateway/gateway-controller/pkg/utils/llm_transformer.gogateway/gateway-controller/pkg/xds/translator.gogateway/gateway-controller/pkg/xds/translator_test.gogateway/it/features/backend-timeout.featuregateway/it/features/llm-backend-timeout.featuregateway/it/features/upstream-connect-timeout.featuregateway/it/steps_backend_timeout.gogateway/it/steps_timeouts.gogateway/it/suite_test.gogateway/it/test-config.tomlkubernetes/gateway-operator/api/v1alpha1/llmprovider_types.gokubernetes/gateway-operator/api/v1alpha1/llmproxy_types.gokubernetes/gateway-operator/api/v1alpha1/restapi_types.gokubernetes/gateway-operator/api/v1alpha1/zz_generated.deepcopy.gokubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_llmproviders.yamlkubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_llmproxies.yamlkubernetes/gateway-operator/config/crd/bases/gateway.api-platform.wso2.com_restapis.yamlkubernetes/gateway-operator/config/samples/api_v1_restapi.yamlkubernetes/helm/operator-helm-chart/crds/gateway.api-platform.wso2.com_llmproviders.yamlkubernetes/helm/operator-helm-chart/crds/gateway.api-platform.wso2.com_llmproxies.yamlkubernetes/helm/operator-helm-chart/crds/gateway.api-platform.wso2.com_restapis.yaml
💤 Files with no reviewable changes (1)
- gateway/it/steps_backend_timeout.go
Purpose
This PR introduces end-to-end timeout configuration across the gateway data plane. It adds a new resilience block to RestApi resources, allowing API- and operation-level configuration of backend request and idle timeouts, which are translated into Envoy route timeouts and can be disabled with "0s". It also enhances support for upstream connect timeouts and exposes Envoy HTTP Connection Manager (HCM) downstream timeouts through config.toml, enabling protection against slow or stalled clients and backends. The implementation includes CRD and schema updates, validation, deployment transforms, xDS translation, documentation updates, and comprehensive integration and unit test coverage to verify timeout behavior across the stack.
@coderabbitai summary