Skip to content

feat(trace-sampling): add probabilistic trace sampling with score parity#60

Open
kxzk wants to merge 2 commits intomainfrom
feature/add-probabilistic-trace-sampling-sample-rate
Open

feat(trace-sampling): add probabilistic trace sampling with score parity#60
kxzk wants to merge 2 commits intomainfrom
feature/add-probabilistic-trace-sampling-sample-rate

Conversation

@kxzk
Copy link
Collaborator

@kxzk kxzk commented Feb 23, 2026

Supersedes #54 after branch rename to conventional format.

TL;DR

Why

Checklist

  • Has label
  • Has linked issue
  • Tests added for new behavior
  • Docs updated (if user-facing)

Closes #

Summary

  • Added config.sample_rate with default 1.0, LANGFUSE_SAMPLE_RATE fallback, and strict 0.0..1.0 validation via Langfuse::ConfigurationError.

  • Wired Langfuse::OtelSetup to use deterministic TraceIdRatioBased sampling when sample_rate < 1.0, and always-on sampling when sample_rate == 1.0.

  • Updated ScoreClient to drop trace-linked scores when the active sampler rejects the trace id, while keeping session-only and dataset-run-only scores unsampled.

  • Added coverage for 0.0 behavior (no env fallback), sampler wiring/determinism, trace-linked score dropping, and legacy trace-id compatibility.

  • Documented sample_rate and score sampling behavior in configuration/scoring/API reference docs.

  • Tests/validation:

    • bundle exec rspec (1170 examples, 0 failures, 97.87% coverage)
    • bundle exec rubocop (no offenses)
    • Langfuse API smoke checks via npx langfuse-cli:
      • Trace list call succeeded (authenticated API validation)
      • sample_rate=0.0 trace run did not produce a trace
      • sample_rate=1.0 trace run produced trace ba2e73bbe3ee9a265b4f2a0fbac829ea and score sample-check

Architecture (Core Change)

flowchart TD
  A["Langfuse::Config.sample_rate"] --> B["OtelSetup.build_sampler"]
  B --> C["TracerProvider sampler (TraceIdRatioBased/ALWAYS_ON)"]
  C --> D["Span export decision"]
  C --> E["ScoreClient sampler check for trace-linked scores"]
  E --> F["Enqueue score event"]
  E --> G["Drop score event"]
Loading

Refs: ML-580

Copilot AI review requested due to automatic review settings February 23, 2026 20:17
@kxzk kxzk added the enhancement New feature or request label Feb 23, 2026
@linear
Copy link

linear bot commented Feb 23, 2026

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurable probabilistic trace sampling to the Langfuse Ruby SDK and aligns trace-linked score ingestion with the active trace sampling decision to avoid orphaned scores.

Changes:

  • Introduces Config#sample_rate (default 1.0) with LANGFUSE_SAMPLE_RATE env support and strict 0.0..1.0 validation.
  • Wires Langfuse::OtelSetup to use deterministic TraceIdRatioBased sampling when sample_rate < 1.0 and always-on sampling at 1.0.
  • Updates ScoreClient behavior and specs so trace-linked scores can be dropped when the sampler rejects a trace id; documents the resulting behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/langfuse/config.rb Adds sample_rate configuration, env parsing, coercion, and validation.
lib/langfuse/otel_setup.rb Configures tracer provider sampler based on sample_rate.
lib/langfuse/score_client.rb Adds sampler-aware gating for trace-linked score enqueuing; adjusts span-id extraction behavior.
spec/langfuse/config_spec.rb Adds tests for defaults, env parsing, coercion/validation, and 0.0 precedence behavior.
spec/langfuse/otel_setup_spec.rb Adds tests ensuring sampler type selection and determinism.
spec/langfuse/score_client_spec.rb Adds tests for dropping trace-linked scores, keeping non-trace-linked scores, and sampled-out active span behavior.
docs/CONFIGURATION.md Documents sample_rate and its impact on trace-linked scores.
docs/SCORING.md Documents sampling behavior for trace-linked vs non-trace-linked scores.
docs/API_REFERENCE.md Adds sample_rate to configuration API reference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

end

def sampling_disabled_for_score?(trace_id)
trace_id.nil? || !config.sample_rate.is_a?(Numeric) || config.sample_rate >= 1.0
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling_disabled_for_score? short-circuits sampling checks whenever config.sample_rate >= 1.0. If a user sets a custom global OpenTelemetry.tracer_provider (e.g., ALWAYS_OFF) while leaving sample_rate at its default 1.0, trace-linked scores will still be enqueued, which can create orphaned scores and contradict the documented “scores follow the sampler” behavior. Consider basing this decision on the active sampler/span sampling decision rather than config.sample_rate, or at least also honoring OpenTelemetry.tracer_provider.sampler when present.

Suggested change
trace_id.nil? || !config.sample_rate.is_a?(Numeric) || config.sample_rate >= 1.0
return true if trace_id.nil?
return true unless config.sample_rate.is_a?(Numeric)
# If an active sampler is configured, defer sampling decisions to it
return false if active_sampler
# When no sampler is available, fall back to config.sample_rate
config.sample_rate >= 1.0

Copilot uses AI. Check for mistakes.
# @param comment [String, nil] Comment
# @param metadata [Hash, nil] Metadata
# @param environment [String, nil] Environment
# @param data_type [String] Data type string (NUMERIC, BOOLEAN, CATEGORICAL)
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YARD doc immediately above prepare_score_event says @param data_type [String], but the method now accepts a Symbol (:numeric, :boolean, :categorical) and converts it internally. Please update the documentation to reflect the actual parameter type to avoid misleading maintainers.

Suggested change
# @param data_type [String] Data type string (NUMERIC, BOOLEAN, CATEGORICAL)
# @param data_type [Symbol] Data type (:numeric, :boolean, :categorical)

Copilot uses AI. Check for mistakes.
Comment on lines +444 to +455
tracer = sampled_out_provider.tracer("sampled-out")
span = tracer.start_span("sampled-out-span")
expect(api_client).not_to receive(:send_batch)

OpenTelemetry::Context.with_current(
OpenTelemetry::Trace.context_with_span(span)
) do
expect do
score_client.score_active_trace(name: "overall_quality", value: 5)
end.not_to raise_error
end
score_client.flush
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example starts a span with start_span but never closes/finishes it. Using tracer.in_span(...) { ... } (or explicitly finishing the span in an ensure) would avoid leaking spans/context across examples and makes the test more robust.

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link

greptile-apps bot commented Feb 23, 2026

Greptile Summary

Adds probabilistic trace sampling via config.sample_rate (0.0..1.0) with LANGFUSE_SAMPLE_RATE env var fallback. Wires the OTel TraceIdRatioBased sampler into the TracerProvider and extends ScoreClient to drop trace-linked scores for sampled-out traces while preserving session-only, dataset-run-only, and legacy trace ID scores.

  • Config: New sample_rate attribute with custom setter for validation/coercion, env var fallback, and validate! integration
  • OtelSetup: build_sampler selects TraceIdRatioBased or ALWAYS_ON based on rate
  • ScoreClient: Refactored create to separate event preparation from enqueueing; enqueue_trace_linked_score? replays the sampler decision against the trace ID; extract_ids_from_active_span changed from recording? to valid? check so sampled-out spans still yield trace IDs for the sampling gate
  • Tests: Comprehensive coverage for config validation, sampler wiring/determinism, score dropping, passthrough cases, and active-span sampling
  • Docs: sample_rate documented in CONFIGURATION.md, SCORING.md, and API_REFERENCE.md

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk — the sampling feature is well-tested and defaults to 1.0 (no behavior change for existing users).
  • Score of 4 reflects solid implementation with thorough test coverage (1170 examples, 97.87% coverage), clean config validation, correct OTel SDK integration, and safe defaults. Minor deduction for the event-building-before-sampling-check inefficiency in ScoreClient#create, which could matter at low sample rates under high throughput. No functional bugs found.
  • lib/langfuse/score_client.rb — the ordering of event preparation vs. sampling check in create is worth reviewing for efficiency at low sample rates.

Important Files Changed

Filename Overview
lib/langfuse/config.rb Adds sample_rate config attribute with attr_reader, custom setter with validation/coercion, env var fallback, and validate! integration. Clean implementation with proper boundary checks.
lib/langfuse/otel_setup.rb Adds build_sampler private method that wires TraceIdRatioBased for rates below 1.0 and ALWAYS_ON otherwise. Cleanly integrated into TracerProvider initialization.
lib/langfuse/score_client.rb Core sampling logic: refactored create to call prepare_score_event then enqueue_trace_linked_score?. Changed extract_ids_from_active_span from recording? to valid? check. Minor inefficiency: builds full event before sampling check, but no bugs found.
spec/langfuse/config_spec.rb Thorough test coverage for sample_rate: env var fallback, invalid values, boundary conditions (0.0, 1.0), and explicit override preventing env fallback.
spec/langfuse/otel_setup_spec.rb Tests sampler wiring for sub-1.0 and 1.0 rates, plus determinism verification. Good coverage of the sampling setup logic.
spec/langfuse/score_client_spec.rb Tests score dropping for sampled-out traces, passthrough for session-only/dataset-run-only scores, legacy trace ID compatibility, and sampled-out active span scenario. Comprehensive coverage.
docs/CONFIGURATION.md Documents sample_rate config option with type, default, description, and env var fallback. Clear explanation of score sampling behavior.
docs/SCORING.md Adds "Sampling Behavior" section documenting trace-linked score dropping, session-only/dataset-run-only passthrough, and legacy trace ID compatibility.
docs/API_REFERENCE.md Adds sample_rate row to configuration properties table.

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant Config as Langfuse::Config
    participant OtelSetup as OtelSetup.setup
    participant TP as TracerProvider
    participant SC as ScoreClient.create
    participant Sampler as TraceIdRatioBased / ALWAYS_ON

    User->>Config: config.sample_rate = 0.1
    Config->>Config: coerce_sample_rate(0.1)
    User->>OtelSetup: setup(config)
    OtelSetup->>OtelSetup: build_sampler(0.1)
    OtelSetup->>TP: new(sampler: TraceIdRatioBased(0.1))

    Note over User,Sampler: Trace creation uses sampler automatically

    User->>SC: create(name:, value:, trace_id:)
    SC->>SC: prepare_score_event(...)
    SC->>SC: enqueue_trace_linked_score?(trace_id)
    SC->>SC: sampling_disabled_for_score?(trace_id)
    alt trace_id is nil or sample_rate >= 1.0
        SC-->>SC: return true (enqueue)
    else valid hex trace_id and sample_rate < 1.0
        SC->>Sampler: should_sample?(trace_id: packed_hex)
        Sampler-->>SC: Result(sampled? = true/false)
        alt sampled? = true
            SC->>SC: enqueue event
        else sampled? = false
            SC-->>User: score dropped (silent)
        end
    end
Loading

Last reviewed commit: d256ca3

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 78 to 93
event = prepare_score_event(
name: name,
value: normalized_value,
value: value,
id: id,
trace_id: trace_id,
session_id: session_id,
observation_id: observation_id,
comment: comment,
metadata: metadata,
environment: environment,
data_type: data_type_str,
data_type: data_type,
dataset_run_id: dataset_run_id,
config_id: config_id
)

@queue << event
return unless enqueue_trace_linked_score?(trace_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary event building before sampling check

prepare_score_event performs validation and builds the full event hash (including SecureRandom.uuid generation, Time.now formatting, and value normalization) before enqueue_trace_linked_score? checks whether the score should be dropped. When sample_rate is low, most trace-linked scores will be discarded, making this work wasted.

Consider moving the sampling check before event preparation:

Suggested change
event = prepare_score_event(
name: name,
value: normalized_value,
value: value,
id: id,
trace_id: trace_id,
session_id: session_id,
observation_id: observation_id,
comment: comment,
metadata: metadata,
environment: environment,
data_type: data_type_str,
data_type: data_type,
dataset_run_id: dataset_run_id,
config_id: config_id
)
@queue << event
return unless enqueue_trace_linked_score?(trace_id)
return unless enqueue_trace_linked_score?(trace_id)
event = prepare_score_event(
name: name,
value: value,
id: id,
trace_id: trace_id,
session_id: session_id,
observation_id: observation_id,
comment: comment,
metadata: metadata,
environment: environment,
data_type: data_type,
dataset_run_id: dataset_run_id,
config_id: config_id
)

return true unless valid_trace_id?(trace_id)

sampler = active_sampler
return true unless sampler.respond_to?(:should_sample?)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS_ON sampler does not respond to should_sample?

OpenTelemetry::SDK::Trace::Samplers::ALWAYS_ON is a constant (a ConstsantSampler instance) that does respond to should_sample?. This guard clause will never short-circuit for any SDK-provided sampler. It only protects against a non-SDK tracer_provider that exposes a sampler attribute of an unexpected type.

This isn't a bug — the fallback is safe and defensive — but it's worth noting that this line won't actually trigger for normal OTel SDK usage.

- Remove prepare_score_event trampoline; inline validation in create
- Move sampling check before event construction to skip unnecessary work
- Inline sampling_disabled_for_score? and valid_trace_id? into caller
- Extract HEX_TRACE_ID_PATTERN constant for regex reuse
- Remove redundant is_a?(Numeric) guard (coerce_sample_rate guarantees Float)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants