fix(opentelemetry): validate x-request-id before using it as trace_id#12990
fix(opentelemetry): validate x-request-id before using it as trace_id#12990prasunsrivastav123-lang wants to merge 14 commits intoapache:masterfrom
Conversation
|
@indrekj @markokocic @huacnlee pls review this pr |
|
Hi @prasunsrivastav123-lang, please add a test case for this fix. |
|
hey @Baoyuantop i added a test case pls review this pr thank you ! |
|
Hi @prasunsrivastav123-lang, please fix failed CI |
|
Hi @Baoyuantop I’ve added the requested test case to cover the invalid X-Request-Id scenario and ensured the route is properly recreated to avoid upstream issues.if any adjustment need pls tell . |
There was a problem hiding this comment.
Pull request overview
This PR fixes a crash in the OpenTelemetry plugin when trace_id_source is set to x-request-id and the request ID is not a valid 32-character hexadecimal trace ID (such as a UUID with hyphens). The plugin now validates the X-Request-Id header and gracefully falls back to the default trace ID generator instead of passing invalid values to the OpenTelemetry encoder, which was causing a Lua runtime error.
Changes:
- Added
is_valid_trace_id()function to validate that trace IDs are exactly 32 hexadecimal characters - Modified
create_tracer_obj()to validate the X-Request-Id header before using it as a trace_id, with fallback to the original generator for invalid values - Added test case (TEST 21) to verify that invalid X-Request-Id headers (UUID format) don't cause crashes
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| apisix/plugins/opentelemetry.lua | Added validation function and modified trace ID generation logic to validate and fallback for invalid request IDs |
| t/plugin/opentelemetry.t | Added test case to verify invalid X-Request-Id doesn't crash the plugin |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local function is_valid_trace_id(trace_id) | ||
| if not trace_id or #trace_id ~= 32 then | ||
| return false | ||
| end |
There was a problem hiding this comment.
The validation function should reject trace IDs that are all zeros (00000000000000000000000000000000) according to the W3C Trace Context specification. Currently, the function only checks length and hex format, but an all-zeros trace_id is invalid and should be rejected to ensure proper fallback to the default generator.
| end | |
| end | |
| -- W3C Trace Context: trace-id of all zeros is invalid | |
| if trace_id == "00000000000000000000000000000000" then | |
| return false | |
| end |
| if plugin_info.trace_id_source == "x-request-id" then | ||
| local original_new_ids = id_generator.new_ids | ||
|
|
||
| id_generator.new_ids = function() | ||
| local header_trace_id = core.request.headers()["x-request-id"] | ||
| or ngx_var.request_id | ||
|
|
||
| if is_valid_trace_id(header_trace_id) then | ||
| return header_trace_id, id_generator.new_span_id() | ||
| end | ||
|
|
||
| -- fallback to default generator for invalid values (e.g. UUID) | ||
| return original_new_ids() | ||
| end |
There was a problem hiding this comment.
The original_new_ids variable is saved inside the if block, which means it's saved every time create_tracer_obj is called when trace_id_source is 'x-request-id'. After the first call, id_generator.new_ids has already been overridden, so subsequent calls (e.g., after cache expiration after 24 hours) would save the already-overridden function, creating nested function wrappers. To fix this, save the original id_generator.new_ids once at module load time (outside the function), and check if it's already been overridden before overriding again, or restore it when trace_id_source changes.
| if plugin_info.trace_id_source == "x-request-id" then | ||
| local original_new_ids = id_generator.new_ids | ||
|
|
||
| id_generator.new_ids = function() | ||
| local header_trace_id = core.request.headers()["x-request-id"] | ||
| or ngx_var.request_id | ||
|
|
||
| if is_valid_trace_id(header_trace_id) then | ||
| return header_trace_id, id_generator.new_span_id() | ||
| end | ||
|
|
||
| -- fallback to default generator for invalid values (e.g. UUID) | ||
| return original_new_ids() | ||
| end | ||
| end |
There was a problem hiding this comment.
Inconsistent indentation in the if statement and local variable declaration. The if statement uses 3 spaces while the local statement uses 2 spaces. According to the .editorconfig, the codebase uses space indentation consistently. Both lines should be indented with 4 spaces to match the function body indentation.
| if plugin_info.trace_id_source == "x-request-id" then | |
| local original_new_ids = id_generator.new_ids | |
| id_generator.new_ids = function() | |
| local header_trace_id = core.request.headers()["x-request-id"] | |
| or ngx_var.request_id | |
| if is_valid_trace_id(header_trace_id) then | |
| return header_trace_id, id_generator.new_span_id() | |
| end | |
| -- fallback to default generator for invalid values (e.g. UUID) | |
| return original_new_ids() | |
| end | |
| end | |
| if plugin_info.trace_id_source == "x-request-id" then | |
| local original_new_ids = id_generator.new_ids | |
| id_generator.new_ids = function() | |
| local header_trace_id = core.request.headers()["x-request-id"] | |
| or ngx_var.request_id | |
| if is_valid_trace_id(header_trace_id) then | |
| return header_trace_id, id_generator.new_span_id() | |
| end | |
| -- fallback to default generator for invalid values (e.g. UUID) | |
| return original_new_ids() | |
| end | |
| end |
| --- response_body | ||
| opentracing | ||
| --- no_error_log | ||
| [error] |
There was a problem hiding this comment.
The test verifies that an invalid X-Request-Id (UUID format) doesn't cause a crash, but it doesn't verify that the plugin correctly falls back to generating a valid trace ID. Consider adding a check in a follow-up test to verify that the trace appears in the OpenTelemetry collector logs with a valid trace_id, similar to TEST 16 which checks the log output.
| [error] | |
| [error] | |
| === TEST 22: invalid x-request-id should still generate a trace_id | |
| --- request | |
| GET /opentracing | |
| --- more_headers | |
| X-Request-Id: 550e8400-e29b-41d4-a716-446655440000 | |
| --- wait: 2 | |
| --- grep_error_log eval | |
| qr/trace_id/ | |
| --- grep_error_log_out | |
| trace_id |
- Add is_valid_trace_id() to validate 32-char hex trace IDs - Reject all-zero trace IDs per W3C spec - Prevent multiple id_generator overrides - Add tests for invalid X-Request-Id fallback behavior Signed-off-by: Prasun Srivastav <prasunsrivastav123@gmail.com>
|
@Baoyuantop Addressed the review feedback: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if plugin_info.trace_id_source == "x-request-id" | ||
| and not id_generator_overridden then | ||
| id_generator.new_ids = function() | ||
|
|
||
| local header_trace_id = core.request.headers()["x-request-id"] | ||
| or ngx_var.request_id | ||
|
|
||
| if is_valid_trace_id(header_trace_id) then | ||
| return header_trace_id, id_generator.new_span_id() | ||
| end | ||
|
|
||
| -- fallback to default generator for invalid values (e.g. UUID) | ||
| return original_new_ids() | ||
| end | ||
| id_generator_overridden = true | ||
| end |
There was a problem hiding this comment.
The indentation is inconsistent in this block. Lines 253-254 have leading spaces instead of consistent indentation, and line 256 appears to have extra trailing spaces. This should be corrected to match the surrounding code style.
| if plugin_info.trace_id_source == "x-request-id" | ||
| and not id_generator_overridden then | ||
| id_generator.new_ids = function() | ||
|
|
||
| local header_trace_id = core.request.headers()["x-request-id"] | ||
| or ngx_var.request_id | ||
|
|
||
| if is_valid_trace_id(header_trace_id) then | ||
| return header_trace_id, id_generator.new_span_id() | ||
| end | ||
|
|
||
| -- fallback to default generator for invalid values (e.g. UUID) | ||
| return original_new_ids() | ||
| end | ||
| id_generator_overridden = true | ||
| end |
There was a problem hiding this comment.
The id_generator override is applied globally and permanently once trace_id_source is set to "x-request-id". If the plugin metadata is later updated to set trace_id_source to "random", the override persists and all requests will still go through the x-request-id validation logic before falling back to the original generator. This is inefficient and not the intended behavior. Consider either: 1) checking the current trace_id_source value inside the overridden function and calling original_new_ids directly when it's "random", or 2) implementing a mechanism to restore the original id_generator.new_ids when the metadata changes back to "random".
|
@Baoyuantop i just Removed the global flag and simplified the id_generator override as suggested. |
|
@Baoyuantop can you review this |
|
Hi @prasunsrivastav123-lang, please fix failed CI |
|
Hi @Baoyuantop , I’ve fixed the CI issue by reindexing the tests and pushed the update. Could you please take another look? Thanks! |
moonming
left a comment
There was a problem hiding this comment.
Hi @prasunsrivastav123-lang, thank you for fixing the x-request-id validation in OpenTelemetry (#12976)!
Validating the x-request-id format before using it as a trace_id is the right approach — invalid formats could cause issues with tracing backends.
The implementation with a fallback to the default generator looks clean. Could you confirm that the tests cover:
- Valid 32-character hex trace IDs
- Invalid formats (too short, non-hex characters, empty string)
- Missing x-request-id header (should use default generator)
If all tests pass, this should be ready. Thank you!
|
@moonming @Baoyuantop Thanks! Yes, I’ve added tests covering valid 32-character hex IDs, multiple invalid formats (UUID, all-zero, malformed), and the missing header fallback case. All tests are passing now. Please take another look. |
|
@moonming if there need any other changes? |
| local _original_new_ids = id_generator.new_ids | ||
|
|
||
| id_generator.new_ids = function() | ||
| local trace_id = core.request.headers()["x-request-id"] or ngx_var.request_id | ||
| return trace_id, id_generator.new_span_id() | ||
| local trace_id = core.request.headers()["x-request-id"] | ||
| or ngx_var.request_id | ||
|
|
||
| if is_valid_trace_id(trace_id) then | ||
| return trace_id, id_generator.new_span_id() | ||
| end | ||
|
|
||
| return _original_new_ids() |
There was a problem hiding this comment.
The lrucache has a 24-hour TTL After expiry, create_tracer_obj is called again. At that point id_generator.new_ids is already the closure from the previous call, so _original_new_ids captures the already-overridden function. Each 24h expiry adds another wrapper layer.
The pre-PR code was idempotent (it overwrote the same closure). This PR makes it strictly worse.
There was a problem hiding this comment.
You're right — the current implementation wraps id_generator.new_ids multiple times when the tracer is recreated, which breaks idempotency and can lead to stacked wrappers over time.
My intention was to validate the x-request-id and fallback safely, but I see that the current approach needs adjustment to avoid repeated wrapping.
I’ll update the implementation to ensure the override happens only once (idempotent behavior) and push a fix shortly
There was a problem hiding this comment.
Hi @prasunsrivastav123-lang, has this comment been resolved?
|
Hi @prasunsrivastav123-lang, there's a conflict in a code file. Could you help me resolve it? |
|
sure 😃 |
|
@Baoyuantop is it ok now ? |
|
Hi @prasunsrivastav123-lang, please fix the failed CI |
|
@Baoyuantop Updated based on feedback. Added full validation and test coverage. |
- Fix mixed indentation (0/3/4/8 spaces) in is_valid_trace_id() and create_tracer_obj() to use consistent 4-space indentation - Restore master's original TEST 20-22 (additional_attributes with numeric nginx variables) that were corrupted during merge - Renumber PR's new test cases as TEST 23-31 to avoid duplicate test numbers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Hi @prasunsrivastav123-lang, there were some formatting issues when the code was merged; I’ve corrected them. |
Replace direct access to Lua global 'string' with a local alias 'string_lower' to satisfy APISIX lint rules. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@Baoyuantop Thanks for fixing the formatting! I’ve updated the implementation to ensure trace_id validation is handled cleanly:
Test cases have been added to cover all these scenarios. Please let me know if any further changes are needed. |
Description
This PR fixes a crash in the OpenTelemetry plugin when trace_id_source is set to x-request-id and the request ID is not a valid 32-hex trace ID (for example, a UUID).
The plugin now validates the header and gracefully falls back to the default trace ID generator instead of passing invalid values to the OpenTelemetry encoder.
Fixes #12976
Fixes #12976
Checklist