feat(trace-sampling): add probabilistic trace sampling with score parity#60
feat(trace-sampling): add probabilistic trace sampling with score parity#60
Conversation
There was a problem hiding this comment.
Pull request overview
Adds configurable probabilistic trace sampling to the Langfuse Ruby SDK and aligns trace-linked score ingestion with the active trace sampling decision to avoid orphaned scores.
Changes:
- Introduces
Config#sample_rate(default1.0) withLANGFUSE_SAMPLE_RATEenv support and strict0.0..1.0validation. - Wires
Langfuse::OtelSetupto use deterministicTraceIdRatioBasedsampling whensample_rate < 1.0and always-on sampling at1.0. - Updates
ScoreClientbehavior and specs so trace-linked scores can be dropped when the sampler rejects a trace id; documents the resulting behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
lib/langfuse/config.rb |
Adds sample_rate configuration, env parsing, coercion, and validation. |
lib/langfuse/otel_setup.rb |
Configures tracer provider sampler based on sample_rate. |
lib/langfuse/score_client.rb |
Adds sampler-aware gating for trace-linked score enqueuing; adjusts span-id extraction behavior. |
spec/langfuse/config_spec.rb |
Adds tests for defaults, env parsing, coercion/validation, and 0.0 precedence behavior. |
spec/langfuse/otel_setup_spec.rb |
Adds tests ensuring sampler type selection and determinism. |
spec/langfuse/score_client_spec.rb |
Adds tests for dropping trace-linked scores, keeping non-trace-linked scores, and sampled-out active span behavior. |
docs/CONFIGURATION.md |
Documents sample_rate and its impact on trace-linked scores. |
docs/SCORING.md |
Documents sampling behavior for trace-linked vs non-trace-linked scores. |
docs/API_REFERENCE.md |
Adds sample_rate to configuration API reference. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/langfuse/score_client.rb
Outdated
| end | ||
|
|
||
| def sampling_disabled_for_score?(trace_id) | ||
| trace_id.nil? || !config.sample_rate.is_a?(Numeric) || config.sample_rate >= 1.0 |
There was a problem hiding this comment.
sampling_disabled_for_score? short-circuits sampling checks whenever config.sample_rate >= 1.0. If a user sets a custom global OpenTelemetry.tracer_provider (e.g., ALWAYS_OFF) while leaving sample_rate at its default 1.0, trace-linked scores will still be enqueued, which can create orphaned scores and contradict the documented “scores follow the sampler” behavior. Consider basing this decision on the active sampler/span sampling decision rather than config.sample_rate, or at least also honoring OpenTelemetry.tracer_provider.sampler when present.
| trace_id.nil? || !config.sample_rate.is_a?(Numeric) || config.sample_rate >= 1.0 | |
| return true if trace_id.nil? | |
| return true unless config.sample_rate.is_a?(Numeric) | |
| # If an active sampler is configured, defer sampling decisions to it | |
| return false if active_sampler | |
| # When no sampler is available, fall back to config.sample_rate | |
| config.sample_rate >= 1.0 |
| # @param comment [String, nil] Comment | ||
| # @param metadata [Hash, nil] Metadata | ||
| # @param environment [String, nil] Environment | ||
| # @param data_type [String] Data type string (NUMERIC, BOOLEAN, CATEGORICAL) |
There was a problem hiding this comment.
The YARD doc immediately above prepare_score_event says @param data_type [String], but the method now accepts a Symbol (:numeric, :boolean, :categorical) and converts it internally. Please update the documentation to reflect the actual parameter type to avoid misleading maintainers.
| # @param data_type [String] Data type string (NUMERIC, BOOLEAN, CATEGORICAL) | |
| # @param data_type [Symbol] Data type (:numeric, :boolean, :categorical) |
| tracer = sampled_out_provider.tracer("sampled-out") | ||
| span = tracer.start_span("sampled-out-span") | ||
| expect(api_client).not_to receive(:send_batch) | ||
|
|
||
| OpenTelemetry::Context.with_current( | ||
| OpenTelemetry::Trace.context_with_span(span) | ||
| ) do | ||
| expect do | ||
| score_client.score_active_trace(name: "overall_quality", value: 5) | ||
| end.not_to raise_error | ||
| end | ||
| score_client.flush |
There was a problem hiding this comment.
This example starts a span with start_span but never closes/finishes it. Using tracer.in_span(...) { ... } (or explicitly finishing the span in an ensure) would avoid leaking spans/context across examples and makes the test more robust.
Greptile SummaryAdds probabilistic trace sampling via
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User as User Code
participant Config as Langfuse::Config
participant OtelSetup as OtelSetup.setup
participant TP as TracerProvider
participant SC as ScoreClient.create
participant Sampler as TraceIdRatioBased / ALWAYS_ON
User->>Config: config.sample_rate = 0.1
Config->>Config: coerce_sample_rate(0.1)
User->>OtelSetup: setup(config)
OtelSetup->>OtelSetup: build_sampler(0.1)
OtelSetup->>TP: new(sampler: TraceIdRatioBased(0.1))
Note over User,Sampler: Trace creation uses sampler automatically
User->>SC: create(name:, value:, trace_id:)
SC->>SC: prepare_score_event(...)
SC->>SC: enqueue_trace_linked_score?(trace_id)
SC->>SC: sampling_disabled_for_score?(trace_id)
alt trace_id is nil or sample_rate >= 1.0
SC-->>SC: return true (enqueue)
else valid hex trace_id and sample_rate < 1.0
SC->>Sampler: should_sample?(trace_id: packed_hex)
Sampler-->>SC: Result(sampled? = true/false)
alt sampled? = true
SC->>SC: enqueue event
else sampled? = false
SC-->>User: score dropped (silent)
end
end
Last reviewed commit: d256ca3 |
lib/langfuse/score_client.rb
Outdated
| event = prepare_score_event( | ||
| name: name, | ||
| value: normalized_value, | ||
| value: value, | ||
| id: id, | ||
| trace_id: trace_id, | ||
| session_id: session_id, | ||
| observation_id: observation_id, | ||
| comment: comment, | ||
| metadata: metadata, | ||
| environment: environment, | ||
| data_type: data_type_str, | ||
| data_type: data_type, | ||
| dataset_run_id: dataset_run_id, | ||
| config_id: config_id | ||
| ) | ||
|
|
||
| @queue << event | ||
| return unless enqueue_trace_linked_score?(trace_id) |
There was a problem hiding this comment.
Unnecessary event building before sampling check
prepare_score_event performs validation and builds the full event hash (including SecureRandom.uuid generation, Time.now formatting, and value normalization) before enqueue_trace_linked_score? checks whether the score should be dropped. When sample_rate is low, most trace-linked scores will be discarded, making this work wasted.
Consider moving the sampling check before event preparation:
| event = prepare_score_event( | |
| name: name, | |
| value: normalized_value, | |
| value: value, | |
| id: id, | |
| trace_id: trace_id, | |
| session_id: session_id, | |
| observation_id: observation_id, | |
| comment: comment, | |
| metadata: metadata, | |
| environment: environment, | |
| data_type: data_type_str, | |
| data_type: data_type, | |
| dataset_run_id: dataset_run_id, | |
| config_id: config_id | |
| ) | |
| @queue << event | |
| return unless enqueue_trace_linked_score?(trace_id) | |
| return unless enqueue_trace_linked_score?(trace_id) | |
| event = prepare_score_event( | |
| name: name, | |
| value: value, | |
| id: id, | |
| trace_id: trace_id, | |
| session_id: session_id, | |
| observation_id: observation_id, | |
| comment: comment, | |
| metadata: metadata, | |
| environment: environment, | |
| data_type: data_type, | |
| dataset_run_id: dataset_run_id, | |
| config_id: config_id | |
| ) | |
| return true unless valid_trace_id?(trace_id) | ||
|
|
||
| sampler = active_sampler | ||
| return true unless sampler.respond_to?(:should_sample?) |
There was a problem hiding this comment.
ALWAYS_ON sampler does not respond to should_sample?
OpenTelemetry::SDK::Trace::Samplers::ALWAYS_ON is a constant (a ConstsantSampler instance) that does respond to should_sample?. This guard clause will never short-circuit for any SDK-provided sampler. It only protects against a non-SDK tracer_provider that exposes a sampler attribute of an unexpected type.
This isn't a bug — the fallback is safe and defensive — but it's worth noting that this line won't actually trigger for normal OTel SDK usage.
- Remove prepare_score_event trampoline; inline validation in create - Move sampling check before event construction to skip unnecessary work - Inline sampling_disabled_for_score? and valid_trace_id? into caller - Extract HEX_TRACE_ID_PATTERN constant for regex reuse - Remove redundant is_a?(Numeric) guard (coerce_sample_rate guarantees Float)
Supersedes #54 after branch rename to conventional format.
TL;DRWhyChecklistCloses #
Summary
Added
config.sample_ratewith default1.0,LANGFUSE_SAMPLE_RATEfallback, and strict0.0..1.0validation viaLangfuse::ConfigurationError.Wired
Langfuse::OtelSetupto use deterministicTraceIdRatioBasedsampling whensample_rate < 1.0, and always-on sampling whensample_rate == 1.0.Updated
ScoreClientto drop trace-linked scores when the active sampler rejects the trace id, while keeping session-only and dataset-run-only scores unsampled.Added coverage for
0.0behavior (no env fallback), sampler wiring/determinism, trace-linked score dropping, and legacy trace-id compatibility.Documented
sample_rateand score sampling behavior in configuration/scoring/API reference docs.Tests/validation:
bundle exec rspec(1170 examples, 0 failures, 97.87% coverage)bundle exec rubocop(no offenses)npx langfuse-cli:sample_rate=0.0trace run did not produce a tracesample_rate=1.0trace run produced traceba2e73bbe3ee9a265b4f2a0fbac829eaand scoresample-checkArchitecture (Core Change)
Refs: ML-580