feat(data-pipeline): export client-computed span stats as OTLP trace metrics#2067
Draft
mabdinur wants to merge 11 commits into
Draft
feat(data-pipeline): export client-computed span stats as OTLP trace metrics#2067mabdinur wants to merge 11 commits into
mabdinur wants to merge 11 commits into
Conversation
…metrics Add an OTLP HTTP/JSON trace-metrics export path so client-computed span stats can be shipped as the `dd.trace.span.duration` histogram to an OTLP `/v1/metrics` endpoint instead of the Datadog agent `/v0.6/stats` endpoint. - libdd-ddsketch: reconstruct a DDSketch from its protobuf (`from_pb` / `from_encoded`) so histogram buckets can be rebuilt from the per-group summaries (Approach A: count + explicit bounds/bucket counts from the sketch bins, sum approximated from bin value*weight). - libdd-data-pipeline: add `OtlpMetricsConfig`, `send_otlp_metrics_http`, the `map_stats_to_otlp_metrics` encoder, and an `OtlpStatsExporter` worker that periodically flushes the concentrator and exports metrics. - TraceExporterBuilder: add `set_otlp_metrics_endpoint`/`set_otlp_metrics_headers`. When set together with `enable_stats`, the span concentrator is started unconditionally (bypassing the agent gate) and `check_agent_info` no longer toggles stats in this mode. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
📚 Documentation Check Results📦
|
Contributor
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
|
Contributor
🔒 Cargo Deny Results📦
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2067 +/- ##
==========================================
+ Coverage 73.42% 73.46% +0.04%
==========================================
Files 465 466 +1
Lines 77949 78552 +603
==========================================
+ Hits 57231 57706 +475
- Misses 20718 20846 +128
🚀 New features to boost your workflow:
|
…ndtrip test under Miri Apply rustfmt formatting to the OTLP trace-metrics module and reorder a re-export to fix the failing Lint (rustfmt) CI job. In test_sketch_pb_roundtrip, compare bin values with a relative tolerance instead of exact equality. Bin values are derived via exp() in LogMapping::value, and Miri intentionally perturbs the last ULPs of transcendental float ops, which made the exact assert_eq! fail under Miri. Bin counts are still compared exactly. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # libdd-data-pipeline/src/otlp/exporter.rs
Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # libdd-data-pipeline/src/trace_exporter/builder.rs # libdd-data-pipeline/src/trace_exporter/mod.rs
Apply rustfmt formatting to test code introduced during the origin/main merge so the Lint (rustfmt) CI job passes. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
Rename the SDK-computed span metric to traces.span.sdk.metrics.duration (unit s), emit a single-source status.code for errors, min/max on each data point, and the dd.* attribute family (dd.operation.name, dd.span.type, dd.span.top_level, dd.origin). Add an enable_otel_trace_semantics() builder toggle that, when set, emits only OpenTelemetry attributes. Add host.name and dd.<key> process-tag resource attributes, plus a grpc_method stats field mapped to rpc.method. Co-authored-by: Cursor <cursoragent@cursor.com>
Move service.name, service.version and deployment.environment.name from the resource to a per-service InstrumentationScope so one payload can carry multiple services. Rename the runtime-id resource attribute to dd.runtime_id (default mode only). service.name no longer appears as a data-point attribute. Co-authored-by: Cursor <cursoragent@cursor.com>
Map the per-group DDSketch into a fixed set of explicit bucket boundaries (the OpenTelemetry spanmetrics connector defaults, in seconds) instead of data-dependent per-bin bounds, so the exported histogram is comparable across tracers and backends. count is exact; sum/min/max stay sketch-approximated. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds native OTLP trace-metrics export to
libdd-data-pipeline. When an OTLP metrics endpoint isconfigured on the
TraceExporter, the span concentrator's stats are mapped into atraces.span.sdk.metrics.durationOTLP histogram and POSTed to the configured/v1/metricsendpoint over HTTP/JSON.
otlp/metrics.rs: OTLP metrics serde types,map_stats_to_otlp_metrics(DDSketch -> explicitbucket histogram, delta temporality), and the
OtlpStatsExporterbackground worker. Data pointsare partitioned by service into per-service
InstrumentationScopes so one payload can carrymultiple services.
service.name,service.version,deployment.environment.name) is reported asInstrumentationScopeattributes; the resource carriesdd.runtime_id.otlp/config.rs:OtlpMetricsConfig.otlp/exporter.rs: sharedsend_otlp_httphelper +send_otlp_metrics_http.TraceExporterBuilder:set_otlp_metrics_endpoint/set_otlp_metrics_headersandenable_otel_trace_semantics(emit only OpenTelemetry attributes, omitdd.*); when an endpointis set, the span concentrator is started and the OTLP stats worker is spawned.
libdd-ddsketch:DDSketch::from_pb/from_encodedto rebuild a sketch from its protobuf.Motivation
Provide OTLP trace-metrics export as a reusable
libdatadogcapability so tracers only supplyconfiguration, building on the existing OTLP export path. Reporting service identity per scope keeps
payloads cross-tracer equivalent and lets a single export carry multiple services.
Additional Notes
For reviewers:
InstrumentationScope, not the resource, so multipleservices aggregate into one payload. The resource carries only
dd.runtime_id.(
EXPLICIT_BOUNDS_SECONDS-- the OpenTelemetry spanmetrics-connector default bounds, in seconds)so the payload is comparable across tracers and backends. Each per-group ok/error DDSketch is
bucketed into those bounds (one bucket per boundary plus a trailing overflow bucket);
countisexact, while
sum/min/maxare reconstructed from the sketch bins (the encoded DDSketch carriesno exact scalars). No new per-cell accumulators. Residual follow-ups: exact per-cell
sum/min/maxwould require accumulators shared with the/v0.6/statspath (out of scope perthe "don't refactor client-side stats internals" constraint);
dd.span.top_levelis a per-groupheuristic (
top_level_hits == hits), correct for homogeneous RED groups.check_agent_infono longer (de)activates stats based onagent info (
otlp_stats_enabledflag), so export works without agent stats support./v1/metrics); invalid headers are skipped with a warning.run()export works as expected. The force-flush in the worker'sshutdown()issues its HTTP request from within the exporter's boundedblock_on(timeout, ...)shutdown path, where the spawned hyper connection task is not reliably driven, so final-bucket data
may be dropped on shutdown. Draining the last bucket on shutdown is a follow-up.
Draft: consumed by the dd-trace-py integration/testing PR (DataDog/dd-trace-py#18354).
How to test the change?
cargo test -p libdd-data-pipeline --lib otlp::metricsand thelibdd-ddsketchprotobuf roundtriptest.
cargo clippy -p libdd-data-pipeline -p libdd-ddsketch --all-featuresis clean.traces.span.sdk.metrics.durationOTLP histogram (service identity on theInstrumentationScope)to a mock
/v1/metricsendpoint.