Hyperfleet-542 : Add OpenTelemetry instrumentation to Sentinel by tirthct · Pull Request #51 · openshift-hyperfleet/hyperfleet-sentinel

tirthct · 2026-02-25T00:49:13Z

Summary
Implements comprehensive OpenTelemetry tracing instrumentation in Sentinel to enable distributed tracing across the HyperFleet orchestration pipeline.

Span hierarchy:
sentinel.poll (root)

HTTP GET (API fetch)
- sentinel.evaluate (resource 1)
  - hyperfleet-clusters publish
- sentinel.evaluate (resource 2)
  - hyperfleet-clusters publish

Resolves
HYPERFLEET-542

Changed

Added pkg/telemetry/otel.go with configurable OpenTelemetry setup
Implemented required spans
Implemented W3C trace context propagation
Added environment variable support for OTEL configuration

Tested

Complete parent-child span relationships : tested
trace_id/span_id appear in structured logs : tested
required spans are created : tested
traceparent extension format : tested

Summary by CodeRabbit

New Features
- Integrated OpenTelemetry tracing across the service, client HTTP calls, and event publishing with env toggles (TRACING_ENABLED), service name override (OTEL_SERVICE_NAME), sampling controls, and graceful tracer startup/shutdown. Trace context is attached to published events. Logger config exposes OTel enablement and sampling; new messaging_system configuration option added.
Tests
- Added unit and integration tests validating tracing initialization, sampler behavior, span creation, propagation, and end-to-end span hierarchies.

openshift-ci · 2026-02-25T00:49:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yingzhanredhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-02-25T00:50:06Z

Walkthrough

Adds OpenTelemetry tracing across the service: new pkg/telemetry with tracer-provider initialization (OTLP or stdout), sampler configuration via env vars, span helpers, CloudEvent trace injection, and shutdown; cmd/sentinel conditionally initializes and shuts down the tracer provider based on TRACING_ENABLED and OTEL_SERVICE_NAME; pkg/logger exposes OTel config; HTTP client transport wrapped with otelhttp; internal/sentinel creates annotated spans for poll/evaluate/publish and attaches trace context to published CloudEvents; extensive unit, integration, and instrumentation tests added; config examples include messaging_system entries; go.mod updated with OTLP-related dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Main as Service Startup
    participant Config as Config/Logger
    participant TP as TracerProvider
    participant Sentinel as Sentinel Loop
    participant Client as HTTP Client
    participant Publisher as Event Publisher
    participant Exporter as Trace Exporter

    Main->>Config: Load OTel config (TRACING_ENABLED, OTEL_SERVICE_NAME, sampling_rate)
    alt tracing enabled
      Main->>TP: InitTraceProvider(serviceName, version, sampling_rate)
      TP->>Exporter: Initialize exporter (OTLP HTTP or stdout)
      Exporter-->>TP: Exporter ready
      TP-->>Main: Tracer provider initialized
    else tracing disabled
      Main-->>Config: Tracing disabled
    end

    Main->>Sentinel: Start polling loop
    activate Sentinel
    Sentinel->>Sentinel: Start "sentinel.poll" span
    loop per resource
      Sentinel->>Sentinel: Start "sentinel.evaluate" span (resource attrs)
      Sentinel->>Client: HTTP request (otelhttp-wrapped)
      Client->>Client: Create HTTP span(s)
      Client-->>Sentinel: Response
      Sentinel->>Publisher: Publish event (SetTraceContext from current span)
      Publisher->>Publisher: Start publish span (messaging attrs)
      Publisher->>Exporter: Export trace
      Publisher-->>Sentinel: Publish result
      Publisher->>Publisher: End publish span
      Sentinel->>Sentinel: End "sentinel.evaluate" span
    end
    Sentinel->>Sentinel: End "sentinel.poll" span
    deactivate Sentinel

    alt tracing enabled
      Main->>TP: Shutdown tracer provider
      TP->>Exporter: Flush & close
      Exporter-->>TP: Shutdown complete
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Hyperfleet-361 : Use updated broker with logger context inheritance #45 — updates publisher.Publish to context-aware calls and propagates logging/trace context, overlapping publish-path changes.
HYPERFLEET-326: Implement configurable logging per HyperFleet specification #34 — earlier logger/init surface changes that this PR extends with OTel configuration and tracer lifecycle.
HYPERFLEET-101: Implement Prometheus metrics and observability for Sentinel #14 — modifies sentinel poll/evaluate paths (metrics), touching the same areas instrumented here.

Suggested labels

lgtm, approved

Suggested reviewers

yasun1
jsell-rh

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Hyperfleet-542 : Add OpenTelemetry instrumentation to Sentinel' clearly and concisely describes the primary change: adding OpenTelemetry tracing to the Sentinel service, which aligns with the extensive changes across multiple files implementing telemetry integration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sentinel/sentinel.go (1)

113-187: ⚠️ Potential issue | 🔴 Critical

Span leak: evalSpan.End() and publishSpan.End() are skipped on continue paths.

When event.SetData fails (Line 141) or Publish fails (Line 164), the continue statements skip evalSpan.End() at Line 187 and publishSpan.End() at Line 167 respectively. Leaked spans will accumulate in memory and produce incomplete traces.

The safest fix is to end spans via defer or explicit End() calls before each continue.

Proposed fix

 		// span: sentinel.evaluate
 		evalCtx, evalSpan := telemetry.StartSpan(ctx, "sentinel.evaluate",
 			attribute.String("hyperfleet.resource_type", resource.Kind),
 			attribute.String("hyperfleet.resource_id", resource.ID),
 		)
 
 		decision := s.decisionEngine.Evaluate(resource, now)
 
 		if decision.ShouldPublish {
 			pending++
 
 			// Add decision reason to context for structured logging
 			eventCtx := logger.WithDecisionReason(evalCtx, decision.Reason)
 
 			// Create CloudEvent
 			event := cloudevents.NewEvent()
 			event.SetSpecVersion(cloudevents.VersionV1)
 			event.SetType(fmt.Sprintf("com.redhat.hyperfleet.%s.reconcile", resource.Kind))
 			event.SetSource("hyperfleet-sentinel")
 			event.SetID(uuid.New().String())
 			if err := event.SetData(cloudevents.ApplicationJSON, map[string]interface{}{
 				"kind":       resource.Kind,
 				"id":         resource.ID,
 				"generation": resource.Generation,
 				"href":       resource.Href,
 				"reason":     decision.Reason,
 			}); err != nil {
 				s.logger.Errorf(eventCtx, "Failed to set event data resource_id=%s error=%v", resource.ID, err)
+				evalSpan.End()
 				continue
 			}
 
 			// span: publish (child of sentinel.evaluate)
 			publishCtx, publishSpan := telemetry.StartSpan(eventCtx, fmt.Sprintf("%s publish", topic),
 				attribute.String("hyperfleet.resource_type", resource.Kind),
 				attribute.String("hyperfleet.resource_id", resource.ID),
 				attribute.String("hyperfleet.decision_reason", decision.Reason),
 				attribute.String("messaging.system", "gcp_pubsub"),
 				attribute.String("messaging.operation.type", "publish"),
 				attribute.String("messaging.destination.name", topic),
 				attribute.String("messaging.message.id", event.ID()),
 			)
 
 			if publishSpan.SpanContext().IsValid() {
 				telemetry.SetTraceContext(&event, publishSpan)
 			}
 
 			// Publish to broker using configured topic
 			if err := s.publisher.Publish(publishCtx, topic, &event); err != nil {
 				// Record broker error
 				metrics.UpdateBrokerErrorsMetric(resourceType, resourceSelector, "publish_error")
 				s.logger.Errorf(eventCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)
+				publishSpan.End()
+				evalSpan.End()
 				continue
 			}
 
 			publishSpan.End()

Alternatively, a cleaner approach would be to use a closure or restructure to use defer-like patterns within the loop body.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 113 - 187, The spans started by
telemetry.StartSpan (evalSpan and publishSpan) are leaked on early continue
paths (failures in event.SetData and s.publisher.Publish); fix by ensuring each
span is ended on all control paths—either call defer evalSpan.End() immediately
after creating evalSpan and defer publishSpan.End() right after creating
publishSpan (so they run even on continue), or explicitly call publishSpan.End()
and evalSpan.End() just before every continue in the error branches (references:
telemetry.StartSpan, evalSpan, publishSpan, event.SetData, s.publisher.Publish).

🧹 Nitpick comments (4)

internal/sentinel/sentinel_test.go (1)
444-448: Global tracer provider is set but never restored after test.

otel.SetTracerProvider(tp) modifies the global state. While tests in the same package run sequentially by default, this could affect subsequent tests that rely on the global provider state (or lack thereof). Consider saving and restoring the previous provider.
Suggested pattern
 	otel.SetTracerProvider(tp)
+	// Optionally restore previous provider on cleanup
+	// prevTP := otel.GetTracerProvider()
+	// defer otel.SetTracerProvider(prevTP)
This is optional since tests currently run sequentially and the deferred tp.Shutdown will clean up the provider.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel_test.go` around lines 444 - 448, The test sets a
global provider via trace.NewTracerProvider(...) and otel.SetTracerProvider(tp)
but doesn't restore the previous global provider; capture the current provider
with otel.GetTracerProvider() before calling otel.SetTracerProvider(tp), and
defer otel.SetTracerProvider(previousProvider) so global state is restored; keep
the existing deferred tp.Shutdown() to shut down the new provider.
internal/client/client_test.go (1)
796-797: Ignored w.Write error return value.

Line 797 ignores the error from w.Write(...), unlike the rest of the test file which consistently handles write errors.
Suggested fix
-		w.Write([]byte(`{"error": "internal server error"}`))
+		if _, err := w.Write([]byte(`{"error": "internal server error"}`)); err != nil {
+			t.Logf("Error writing response: %v", err)
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/client/client_test.go` around lines 796 - 797, The test HTTP handler
ignores the error returned from w.Write(...) at the end of the handler; update
the handler to check the write error and fail the test if it occurs.
Specifically, replace the bare w.Write(...) call following
w.WriteHeader(http.StatusInternalServerError) with a checked write (capture the
return error from w.Write and call t.Fatalf or t.Errorf with the error), using
the same testing.T variable used elsewhere in the file so failures are reported
consistently.
test/integration/integration_test.go (1)
459-464: InitTraceProvider with stdout exporter will write span output to stdout during test runs.

Since no OTEL_EXPORTER_OTLP_ENDPOINT env var is set in the test, InitTraceProvider falls back to the stdout exporter, which will print formatted span data to stdout during test execution. This is noisy but functional. If this is intentional for debugging, it's fine; otherwise consider using an in-memory exporter as done in TestIntegration_EndToEndSpanHierarchy.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration_test.go` around lines 459 - 464, The test
currently calls telemetry.InitTraceProvider(ctx, "sentinel", "test", 1.0) which
falls back to the stdout exporter and prints spans to stdout; change the test to
use the in-memory exporter instead of stdout (or set
OTEL_EXPORTER_OTLP_ENDPOINT) by invoking the InitTraceProvider variant or option
the project uses for in-memory tracing (match the pattern used in
TestIntegration_EndToEndSpanHierarchy), and keep the deferred
telemetry.Shutdown(tp) call; update the call site in integration_test.go to use
that in-memory configuration so spans are not emitted to stdout during test
runs.
pkg/logger/logger.go (1)
46-50: Consider validating SamplingRate range.

SamplingRate is a float64 with no validation at the config level. While InitTraceProvider in pkg/telemetry/otel.go handles edge cases (≥1.0 → AlwaysSample, ≤0.0 → NeverSample), a negative or >1.0 value could be confusing to operators. Consider documenting the valid range or adding a validation method on OTelConfig.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/logger/logger.go` around lines 46 - 50, Add explicit validation for
OTelConfig.SamplingRate by implementing a validation method (e.g., func (c
*OTelConfig) Validate() error) that ensures SamplingRate is within the inclusive
range 0.0–1.0 and returns a clear error if out of range; reference the
OTelConfig struct and call this Validate method where configuration is loaded or
before InitTraceProvider is invoked so invalid values error early (include a
helpful message indicating the valid range).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sentinel/main.go`:
- Around line 257-262: The telemetry shutdown is using the same shutdownCtx as
metricsServer.Shutdown and the log call uses the already-cancelled ctx; create a
separate context with its own timeout for telemetry (e.g., teleCtx via
context.WithTimeout) and call telemetry.Shutdown(teleCtx, tp) so it has its own
budget independent of metricsServer.Shutdown, ensure you cancel the teleCtx
after use, and change the error log invocation (log.Extra(...).Error) to use the
telemetry context (teleCtx) or the appropriate non-cancelled shutdown context so
the log is not emitted with a canceled ctx; reference telemetry.Shutdown,
metricsServer.Shutdown, shutdownCtx, tp, and log.Extra to locate and update the
code.
- Around line 153-169: The OTel settings never get set because initLogging uses
DefaultConfig() which leaves LogConfig.OTel.Enabled false; update initLogging to
populate logCfg.OTel.Enabled and logCfg.OTel.SamplingRate from configuration
sources (either the loaded SentinelConfig/YAML or environment variables) so the
check in main.go that inspects logCfg.OTel.Enabled can actually enable tracing;
specifically, within initLogging (and where config.LoadConfig/SentinelConfig is
processed) read an OTEL_ENABLED (boolean) env var or map the YAML fields into
LogConfig.OTel (including SamplingRate) and assign them to logCfg before main.go
calls telemetry.InitTraceProvider (referencing symbols: initLogging, LogConfig,
DefaultConfig, logCfg.OTel.Enabled, logCfg.OTel.SamplingRate,
config.LoadConfig).

In `@internal/sentinel/sentinel.go`:
- Around line 149-150: The attribute.String("messaging.system", "gcp_pubsub") is
hardcoded but should come from configuration; change the code that builds the
span/attributes (the attribute.String calls around
messaging.operation.type/publish in sentinel.go) to read the messaging system
from a configuration value or environment variable (e.g., cfg.MessagingSystem or
os.Getenv("MESSAGING_SYSTEM")) and use that value when constructing
attribute.String("messaging.system", <configuredValue>), with a sensible default
(e.g., "rabbitmq") if unset.

In `@pkg/telemetry/otel_test.go`:
- Around line 308-326: TestSetTraceContext_InvalidSpan currently ends a real
span which doesn't make SpanContext invalid; replace creation of the span with a
no-op/non-recording span (e.g., obtain span via
trace.SpanFromContext(context.Background())) so span.SpanContext().IsValid() is
false and the !IsValid() branch in SetTraceContext is exercised, then call
SetTraceContext(&event, span) and assert that no "traceparent" extension was
added to event.Extensions(); update imports accordingly to reference
go.opentelemetry.io/otel/trace if needed.
- Around line 42-77: TestInitTraceProvider_OTLPExporter currently points
OTEL_EXPORTER_OTLP_ENDPOINT at a non-existent collector and then calls Shutdown
which blocks/fails; fix by making the test use a local httptest server that
responds 200 to POST /v1/traces and set OTEL_EXPORTER_OTLP_ENDPOINT to that
server's URL (server.URL + "/v1/traces"), so InitTraceProvider, tracer use, and
the deferred Shutdown call complete reliably; update the test to start
httptest.NewServer before setting the env var and close the server at the end,
referencing TestInitTraceProvider_OTLPExporter, InitTraceProvider, and Shutdown
in your changes.
- Around line 65-70: Deferred cleanup closures calling Shutdown currently use
t.Fatal which is unsafe inside defers; change those deferred anonymous functions
(the ones invoking Shutdown(ctx, tp) and similar cleanup calls) to call t.Errorf
with the error when Shutdown returns non-nil instead of t.Fatal/t.Fatalf, so the
test reports the cleanup failure without invoking Fatal after the test body has
returned. Locate the three other deferred closures in this file that call
Shutdown or similar cleanup (same pattern as the closure around Shutdown at
lines noted in the review) and replace their t.Fatal/t.Fatalf calls with
t.Errorf while keeping the rest of the defer logic unchanged.

In `@pkg/telemetry/otel.go`:
- Line 117: Fix the typo in the doc comment for StartSpan: change "conext" to
"context" in the comment that reads "// StartSpan starts a span and enriches
conext with trace/span IDs for logging" so it correctly reads "// StartSpan
starts a span and enriches context with trace/span IDs for logging"; locate the
comment above the StartSpan function in otel.go and update the word only.
- Around line 22-29: The listed package-level string literals
(SAMPLER_ALWAYS_ON, SAMPLER_ALWAYS_OFF, SAMPLER_TRACE_ID_RATIO,
ENV_OTEL_TRACES_SAMPLER, ENV_OTEL_TRACES_SAMPLER_ARG,
ENV_OTEL_EXPORTER_OTLP_ENDPOINT) should be declared as constants instead of
variables; replace the var (...) block in pkg/telemetry/otel.go with a const
(...) block so these identifiers are immutable and cannot be reassigned
elsewhere in the package.
- Around line 139-145: The traceparent being set in SetTraceContext incorrectly
hardcodes the trace-flags byte to "01"; update SetTraceContext to read the
actual flags from span.SpanContext().TraceFlags() and format that byte as two
hex characters when building traceParent (keep the existing TraceID and SpanID
usage and the IsValid() guard). Use the span context's TraceFlags value
(converted to a uint8 and formatted with "%02x") instead of the fixed "-01" so
the traceparent accurately reflects whether the span was sampled.
- Around line 41-46: The code incorrectly passes a full URL from
ENV_OTEL_EXPORTER_OTLP_ENDPOINT into otlptracehttp.WithEndpoint (which expects
host:port), causing mis-parsing; fix by either removing the manual env var
handling and simply calling otlptracehttp.New(ctx) so the SDK reads
OTEL_EXPORTER_OTLP_ENDPOINT itself, or if you must pass it explicitly, replace
otlptracehttp.WithEndpoint(otlpEndpoint) with
otlptracehttp.WithEndpointURL(parsedURL) (use the URL form) and ensure
otlpEndpoint is a properly parsed *url.URL; update error logging in the
otlptracehttp.New/creation branch (log.Errorf) accordingly and remove the
redundant env read of ENV_OTEL_EXPORTER_OTLP_ENDPOINT.

In `@test/integration/integration_test.go`:
- Around line 816-817: The loop using pollSpans[:2] can panic if fewer than 2
spans were captured; update the iteration to safely bound the slice by computing
n := len(pollSpans) and if n > 2 set n = 2, then iterate over pollSpans[:n] (or
otherwise iterate up to min(2, len(pollSpans))). Apply this change around the
existing pollSpans variable and the for _, pollSpan := range pollSpans[:2] loop
so the test never slices beyond the available elements.

---

Outside diff comments:
In `@internal/sentinel/sentinel.go`:
- Around line 113-187: The spans started by telemetry.StartSpan (evalSpan and
publishSpan) are leaked on early continue paths (failures in event.SetData and
s.publisher.Publish); fix by ensuring each span is ended on all control
paths—either call defer evalSpan.End() immediately after creating evalSpan and
defer publishSpan.End() right after creating publishSpan (so they run even on
continue), or explicitly call publishSpan.End() and evalSpan.End() just before
every continue in the error branches (references: telemetry.StartSpan, evalSpan,
publishSpan, event.SetData, s.publisher.Publish).

---

Nitpick comments:
In `@internal/client/client_test.go`:
- Around line 796-797: The test HTTP handler ignores the error returned from
w.Write(...) at the end of the handler; update the handler to check the write
error and fail the test if it occurs. Specifically, replace the bare
w.Write(...) call following w.WriteHeader(http.StatusInternalServerError) with a
checked write (capture the return error from w.Write and call t.Fatalf or
t.Errorf with the error), using the same testing.T variable used elsewhere in
the file so failures are reported consistently.

In `@internal/sentinel/sentinel_test.go`:
- Around line 444-448: The test sets a global provider via
trace.NewTracerProvider(...) and otel.SetTracerProvider(tp) but doesn't restore
the previous global provider; capture the current provider with
otel.GetTracerProvider() before calling otel.SetTracerProvider(tp), and defer
otel.SetTracerProvider(previousProvider) so global state is restored; keep the
existing deferred tp.Shutdown() to shut down the new provider.

In `@pkg/logger/logger.go`:
- Around line 46-50: Add explicit validation for OTelConfig.SamplingRate by
implementing a validation method (e.g., func (c *OTelConfig) Validate() error)
that ensures SamplingRate is within the inclusive range 0.0–1.0 and returns a
clear error if out of range; reference the OTelConfig struct and call this
Validate method where configuration is loaded or before InitTraceProvider is
invoked so invalid values error early (include a helpful message indicating the
valid range).

In `@test/integration/integration_test.go`:
- Around line 459-464: The test currently calls telemetry.InitTraceProvider(ctx,
"sentinel", "test", 1.0) which falls back to the stdout exporter and prints
spans to stdout; change the test to use the in-memory exporter instead of stdout
(or set OTEL_EXPORTER_OTLP_ENDPOINT) by invoking the InitTraceProvider variant
or option the project uses for in-memory tracing (match the pattern used in
TestIntegration_EndToEndSpanHierarchy), and keep the deferred
telemetry.Shutdown(tp) call; update the call site in integration_test.go to use
that in-memory configuration so spans are not emitted to stdout during test
runs.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09a1a58 and dae968d.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

cmd/sentinel/main.go
go.mod
internal/client/client.go
internal/client/client_test.go
internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/logger/logger.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

cmd/sentinel/main.go

internal/sentinel/sentinel.go

pkg/telemetry/otel_test.go

pkg/telemetry/otel.go

test/integration/integration_test.go

coderabbitai

♻️ Duplicate comments (2)

cmd/sentinel/main.go (2)
160-176: ⚠️ Potential issue | 🔴 Critical

logCfg.OTel.Enabled is always false — OTel initialization is unreachable dead code.

initLogging constructs LogConfig via logger.DefaultConfig() (which sets OTel.Enabled: false) and only updates Level, Format, and Output. Neither OTEL_ENABLED nor OTEL_SAMPLING_RATE are ever read and applied to logCfg.OTel. As a result, the branch at Line 166 is never taken and tracing is never activated.

This was flagged in a previous review; the fix (reading OTEL_ENABLED/OTEL_SAMPLING_RATE from env vars in initLogging) has not been applied yet.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 160 - 176, The OTel branch is never
reached because initLogging uses logger.DefaultConfig() which leaves
logCfg.OTel.Enabled false and never reads OTEL_ENABLED/OTEL_SAMPLING_RATE;
update initLogging to read the OTEL_ENABLED and OTEL_SAMPLING_RATE environment
variables and apply them to the returned LogConfig (i.e., set
logCfg.OTel.Enabled based on OTEL_ENABLED parsing and set
logCfg.OTel.SamplingRate from OTEL_SAMPLING_RATE parsing), ensuring any parsing
errors are handled sensibly so the downstream check of logCfg.OTel.Enabled in
main.go will correctly enable telemetry; reference initLogging,
logger.DefaultConfig(), LogConfig and logCfg.OTel.Enabled/SamplingRate in your
changes.
290-295: ⚠️ Potential issue | 🟠 Major

OTel shutdown reuses the shared shutdownCtx, and the error log uses an already-cancelled ctx.

Two pre-existing issues remain unaddressed:

telemetry.Shutdown shares the same shutdownCtx (20 s) already partially consumed by healthServer.Shutdown + metricsServer.Shutdown. If those are slow, the OTel flush budget is eroded.

Line 293 passes the root ctx (cancelled at Line 279) to log.Extra(...).Error(...), so the log call is issued with a cancelled context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 290 - 295, The OTel shutdown call reuses
the shared shutdownCtx and logs using the already-cancelled root ctx; change
telemetry.Shutdown(shutdownCtx, tp) to use its own fresh context and ensure the
log call uses a live context: create a new context with timeout (e.g., otelCtx,
otelCancel := context.WithTimeout(context.Background(), 20*time.Second)) and
defer otelCancel(), call telemetry.Shutdown(otelCtx, tp), and pass a
non-cancelled context (e.g., otelCtx or context.Background()) to
log.Extra(...).Error(...) so the shutdown has its own budget and the error is
logged with an active context.

🧹 Nitpick comments (1)

cmd/sentinel/main.go (1)
30-33: serviceName is a runtime-configurable value mixed into the build-info var block.

version, commit, and date are set by ldflags at build time and are effectively constants, while serviceName is mutated at runtime inside runServe (Line 162). Keeping them in the same block is misleading. Consider making serviceName a local variable in runServe or, if it needs to be the default value, a separate const.
♻️ Suggested refactor
 var (
 	version = "dev"
 	commit  = "none"
 	date    = "unknown"
-	serviceName = "sentinel"
 )
Then in runServe:
+	serviceName := "sentinel"
 	if envServiceName := os.Getenv("OTEL_SERVICE_NAME"); envServiceName != "" {
 		serviceName = envServiceName
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 30 - 33, The serviceName variable should
not live in the same build-info var block as the ldflags-populated constants
(version, commit, date); move serviceName out of that var block and make it
either a local variable inside runServe (referencing runServe) if it is mutated
at runtime, or make it a separate const/var with a default value distinct from
version/commit/date if you need a package-level default. Update any references
to serviceName accordingly (e.g., where runServe mutates it) so you no longer
mix runtime-mutable state with the ldflags-injected build constants.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cmd/sentinel/main.go`:
- Around line 160-176: The OTel branch is never reached because initLogging uses
logger.DefaultConfig() which leaves logCfg.OTel.Enabled false and never reads
OTEL_ENABLED/OTEL_SAMPLING_RATE; update initLogging to read the OTEL_ENABLED and
OTEL_SAMPLING_RATE environment variables and apply them to the returned
LogConfig (i.e., set logCfg.OTel.Enabled based on OTEL_ENABLED parsing and set
logCfg.OTel.SamplingRate from OTEL_SAMPLING_RATE parsing), ensuring any parsing
errors are handled sensibly so the downstream check of logCfg.OTel.Enabled in
main.go will correctly enable telemetry; reference initLogging,
logger.DefaultConfig(), LogConfig and logCfg.OTel.Enabled/SamplingRate in your
changes.
- Around line 290-295: The OTel shutdown call reuses the shared shutdownCtx and
logs using the already-cancelled root ctx; change
telemetry.Shutdown(shutdownCtx, tp) to use its own fresh context and ensure the
log call uses a live context: create a new context with timeout (e.g., otelCtx,
otelCancel := context.WithTimeout(context.Background(), 20*time.Second)) and
defer otelCancel(), call telemetry.Shutdown(otelCtx, tp), and pass a
non-cancelled context (e.g., otelCtx or context.Background()) to
log.Extra(...).Error(...) so the shutdown has its own budget and the error is
logged with an active context.

---

Nitpick comments:
In `@cmd/sentinel/main.go`:
- Around line 30-33: The serviceName variable should not live in the same
build-info var block as the ldflags-populated constants (version, commit, date);
move serviceName out of that var block and make it either a local variable
inside runServe (referencing runServe) if it is mutated at runtime, or make it a
separate const/var with a default value distinct from version/commit/date if you
need a package-level default. Update any references to serviceName accordingly
(e.g., where runServe mutates it) so you no longer mix runtime-mutable state
with the ldflags-injected build constants.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dae968d and 3844006.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (2)

cmd/sentinel/main.go
go.mod

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/client/client_test.go`:
- Line 804: The test currently ignores the error returned by
NewHyperFleetClient; update the test to capture both return values (client, err
:= NewHyperFleetClient(...)) and assert/fail if err != nil (e.g., t.Fatalf or
require.NoError) before using client so any constructor failures surface
immediately; ensure you also handle the client variable in subsequent
assertions.
- Around line 795-800: In the httptest server handler (the anonymous
http.HandlerFunc passed to httptest.NewServer) replace the unsafe t.Fatalf call
with t.Errorf and then return to avoid invoking FailNow from a handler
goroutine; specifically change the error handling after w.Write to call
t.Errorf("Failed to write response: %v", err) followed by return instead of
t.Fatalf.
- Around line 670-676: Capture the previous global tracer provider before
calling otel.SetTracerProvider(tp) in both
TestNewHyperFleetClient_HTTPInstrumentation and
TestNewHyperFleetClient_HTTPInstrumentation_ErrorCase (e.g., prevTp :=
otel.GetTracerProvider()), then in the defer restore it (e.g., defer
otel.SetTracerProvider(prevTp)) and still shutdown tp as before; ensure you
reference the local tp shutdown logic (tp.Shutdown(ctx)) but add restoring the
previous provider to avoid test-state leakage.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3844006 and 5fb7281.

📒 Files selected for processing (1)

internal/client/client_test.go

internal/client/client_test.go

coderabbitai · 2026-02-25T04:10:34Z

internal/client/client_test.go

+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.WriteHeader(http.StatusInternalServerError)
+		_, err := w.Write([]byte(`{"error": "internal server error"}`))
+		if err != nil {
+			t.Fatalf("Failed to write response: %v", err)
+		}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n internal/client/client_test.go | sed -n '790,810p'

Repository: openshift-hyperfleet/hyperfleet-sentinel

Length of output: 868

🏁 Script executed:

# Also check if there are other similar patterns in the test file rg 't\.Fatalf|t\.FailNow' internal/client/client_test.go -A 3 -B 3

Repository: openshift-hyperfleet/hyperfleet-sentinel

Length of output: 4362

🏁 Script executed:

# Check the full context around httptest usage in this file rg 'httptest\.NewServer' internal/client/client_test.go -B 5 -A 20

Repository: openshift-hyperfleet/hyperfleet-sentinel

Length of output: 11976

Replace t.Fatalf with t.Errorf and return in httptest handler.

Calling t.Fatalf from an httptest handler goroutine is unsafe—Fatalf and FailNow can only be called from the test goroutine. The pattern throughout this file uses t.Errorf for errors in handlers.

Proposed fix

_, err := w.Write([]byte(`{"error": "internal server error"}`)) if err != nil { - t.Fatalf("Failed to write response: %v", err) + t.Errorf("Failed to write response: %v", err) + return }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

w.WriteHeader(http.StatusInternalServerError)

_, err := w.Write([]byte(`{"error": "internal server error"}`))

if err != nil {

t.Fatalf("Failed to write response: %v", err)

}

server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

w.WriteHeader(http.StatusInternalServerError)

_, err := w.Write([]byte(`{"error": "internal server error"}`))

if err != nil {

t.Errorf("Failed to write response: %v", err)

return

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/client/client_test.go` around lines 795 - 800, In the httptest server handler (the anonymous http.HandlerFunc passed to httptest.NewServer) replace the unsafe t.Fatalf call with t.Errorf and then return to avoid invoking FailNow from a handler goroutine; specifically change the error handling after w.Write to call t.Errorf("Failed to write response: %v", err) followed by return instead of t.Fatalf.

Replaced fatals with errors

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the specific area where the issue was reported (around line 795-800) cat -n internal/client/client_test.go | sed -n '792,802p' # Also search for any remaining t.Fatalf in httptest handlers to ensure all instances were fixed echo "---" echo "Checking for any remaining t.Fatalf in httptest handlers:" rg -n 't\.Fatalf' internal/client/client_test.go -A 2 -B 2 | head -40

Length of output: 1765

🏁 Script executed:

#!/bin/bash # Check the TestNewHyperFleetClient_HTTPInstrumentation_ErrorCase function specifically # This is where the issue was reported cat -n internal/client/client_test.go | sed -n '778,824p'

Length of output: 1843

🏁 Script executed:

#!/bin/bash # Check the other instrumentation test (TestNewHyperFleetClient_HTTPInstrumentation) # to see if there are similar issues with t.Fatalf in handlers cat -n internal/client/client_test.go | sed -n '662,700p'

Length of output: 1569

@tirthct, I checked the code and the t.Fatalf call is still present at line 809 in the TestNewHyperFleetClient_HTTPInstrumentation_ErrorCase function:

server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusInternalServerError) _, err := w.Write([]byte(`{"error": "internal server error"}`)) if err != nil { t.Fatalf("Failed to write response: %v", err) // ← Still here at line 809 } }))

Could you please replace this with t.Errorf followed by return to match the pattern used elsewhere in the test file? The fix should be:

if err != nil { t.Errorf("Failed to write response: %v", err) return }

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the exact code around line 809 where the fix should be applied cat -n internal/client/client_test.go | sed -n '805,812p'

Length of output: 361

@tirthct, the fix is almost complete, but the return statement is still missing. Currently at line 809, the code shows:

if err != nil { t.Errorf("Failed to write response: %v", err) }

It should be:

if err != nil { t.Errorf("Failed to write response: %v", err) return }

Without the return, the handler continues executing after the error, which could lead to unexpected behavior. Could you please add the return statement after t.Errorf?

internal/client/client_test.go

pkg/telemetry/otel.go

cmd/sentinel/main.go

rafabene · 2026-02-25T12:57:46Z

internal/sentinel/sentinel.go

+
 			// Publish to broker using configured topic
-			if err := s.publisher.Publish(eventCtx, topic, &event); err != nil {
+			if err := s.publisher.Publish(publishCtx, topic, &event); err != nil {


Errors in the Publish path are logged but never recorded on the span. The
https://github.com/openshift-hyperfleet/architecture/blob/main/hyperfleet/standards/tracing.md
requires span.RecordError() and span.SetStatus() so errors are visible in trace backends
(Jaeger/Tempo). Without this, failed publish operations appear as successful spans.

Suggested change

if err := s.publisher.Publish(publishCtx, topic, &event); err != nil {

if err := s.publisher.Publish(publishCtx, topic, &event); err != nil {

publishSpan.RecordError(err)

publishSpan.SetStatus(codes.Error, "publish failed")

metrics.UpdateBrokerErrorsMetric(resourceType, resourceSelector, "publish_error")

s.logger.Errorf(eventCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)

publishSpan.End()

evalSpan.End()

continue

}

Ah, okay. Added RecordError() and SetStatus()

…OTel traceparent function and tests

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

pkg/telemetry/otel.go (1)

93-93: ⚠️ Potential issue | 🟠 Major

Prefer parent-based ratio sampling to preserve incoming trace decisions.

Using trace.TraceIDRatioBased directly can resample child spans independently. Wrap it with trace.ParentBased(...) for distributed trace continuity.

💡 Proposed fix

-		sampler = trace.TraceIDRatioBased(samplingRate)
+		sampler = trace.ParentBased(trace.TraceIDRatioBased(samplingRate))

OpenTelemetry Go: For services receiving remote parent context, is ParentBased(TraceIDRatioBased(x)) recommended over TraceIDRatioBased(x) to preserve parent sampling decisions?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel.go` at line 93, Replace the direct TraceIDRatioBased
sampler assignment so child spans respect remote parent sampling: instead of
assigning sampler = trace.TraceIDRatioBased(samplingRate) wrap it with
trace.ParentBased(trace.TraceIDRatioBased(samplingRate)) (use the same
samplingRate variable) so parent-based sampling is preserved for incoming trace
contexts.

cmd/sentinel/main.go (2)

34-34: ⚠️ Potential issue | 🟡 Minor

Set the default OTel service name to hyperfleet-sentinel.

Keeping the default as sentinel can fragment service identity in tracing backends unless explicitly overridden.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` at line 34, The default OTel serviceName value
currently set as serviceName = "sentinel" should be changed to
"hyperfleet-sentinel" to standardize tracing identity; update the assignment of
the serviceName variable in cmd/sentinel/main.go (the serviceName symbol) to use
"hyperfleet-sentinel" as the default so traces are attributed consistently
unless explicitly overridden.

301-305: ⚠️ Potential issue | 🟡 Minor

Use the telemetry shutdown context when logging shutdown errors.

The error log still uses ctx instead of otelShutdownCtx, which may already be canceled during shutdown.

💡 Proposed fix

-			if err := telemetry.Shutdown(otelShutdownCtx, tp); err != nil {
-				log.Extra("error", err).Error(ctx, "Failed to shutdown OpenTelemetry")
+			if err := telemetry.Shutdown(otelShutdownCtx, tp); err != nil {
+				log.Extra("error", err).Error(otelShutdownCtx, "Failed to shutdown OpenTelemetry")
 			}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 301 - 305, The shutdown logging currently
uses the outer request context variable ctx which may be canceled; change the
log call to use the telemetry shutdown context instead: when calling
telemetry.Shutdown(otelShutdownCtx, tp) and it returns an error, pass
otelShutdownCtx into log.Extra(...).Error(...) (keeping the existing
log.Extra("error", err) and message) so the failure is logged with the
shutdown-specific context created by context.WithTimeout
(otelShutdownCtx/otelShutdownCancel).

🧹 Nitpick comments (1)

test/integration/integration_test.go (1)

902-909: Trace continuity assertion is too broad and can become flaky.

This currently counts trace IDs from all captured spans. If unrelated root spans are emitted, the check can fail even when poll/evaluate/publish hierarchy is correct. Prefer deriving continuity from poll traces (or poll descendants) only.

Suggested tightening

- // Verify trace continuity - spans should form coherent traces
- traceIDs := make(map[string]bool)
- for _, span := range spans {
- 	traceIDs[span.SpanContext.TraceID().String()] = true
- }
-
- if len(traceIDs) > len(pollSpans) {
- 	t.Errorf("Expected spans to belong to %d traces (one per poll), but found %d trace IDs", len(pollSpans), len(traceIDs))
- }
+ // Verify trace continuity for poll roots specifically (one trace per poll span)
+ pollTraceIDs := make(map[string]struct{})
+ for _, pollSpan := range pollSpans {
+ 	pollTraceIDs[pollSpan.SpanContext.TraceID().String()] = struct{}{}
+ }
+ if len(pollTraceIDs) != len(pollSpans) {
+ 	t.Errorf("Expected one unique trace per poll span, got %d unique traces for %d poll spans", len(pollTraceIDs), len(pollSpans))
+ }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration_test.go` around lines 902 - 909, The current
check counts trace IDs from all captured spans which can include unrelated roots
and become flaky; change it to derive continuity only from the poll traces by
first building a set of expectedTraceIDs from pollSpans (use each
pollSpan.SpanContext.TraceID().String()) and then only collect/count trace IDs
from spans whose TraceID is in that expectedTraceIDs set (i.e., filter spans by
membership in the poll trace set) before asserting the number of distinct
poll-related traces matches len(pollSpans).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/config/config.go`:
- Around line 110-112: The code currently assigns MESSAGING_SYSTEM even when the
environment variable is present but empty; update the block that reads
os.LookupEnv("MESSAGING_SYSTEM") so it trims and checks the value is non-empty
before assigning to cfg.MessagingSystem (e.g., use strings.TrimSpace on the
looked-up value and only set cfg.MessagingSystem when the trimmed value != "");
this avoids clearing defaults or emitting empty messaging.system attributes.

In `@internal/sentinel/sentinel.go`:
- Around line 114-118: The telemetry spans created via telemetry.StartSpan
(e.g., evalSpan and publishSpan in the sentinel.evaluate flow) are not ended on
error paths that call continue; update each error branch that returns/continues
after calling resource.SetData or publisher.Publish to call the corresponding
span.End() (evalSpan.End() and, if publishSpan was created, publishSpan.End())
before continuing so all spans are closed; search for usages of SetData and
Publish within the sentinel evaluation loop (the branches around evalSpan and
publishSpan creation) and add the missing End() calls on every error path.

In `@pkg/telemetry/otel_test.go`:
- Around line 84-104: The test calls Shutdown twice (once in the deferred
anonymous func and once explicitly), causing flakiness; update
TestInitTraceProvider_OTLPExporter to call Shutdown only once by removing the
redundant call: either eliminate the explicit err = Shutdown(shutdownCtx, tp)
block (keeping the defer) or remove the defer and keep the explicit shutdown,
ensuring only a single call to Shutdown(ctx, tp) on the trace provider
(reference Shutdown and the deferred anonymous func that wraps Shutdown).

In `@test/integration/integration_test.go`:
- Around line 717-718: The test currently ignores the error returned by
client.NewHyperFleetClient; modify the setup to capture the error (e.g.,
hyperfleetClient, err := client.NewHyperFleetClient(...)) and fail fast if it is
non-nil by calling t.Fatalf or using the test assertion helper (e.g.,
require.NoError) so setup failures are explicit; update the code around
NewHyperFleetClient and any test setup that depends on hyperfleetClient to use
the checked variable.
- Around line 675-682: The test sets a new global OTEL tracer provider with
otel.SetTracerProvider(tp) but never restores the previous global provider,
leaking state to other tests; capture the existing provider via
otel.GetTracerProvider() before calling otel.SetTracerProvider(tp) and in the
deferred cleanup restore it by calling otel.SetTracerProvider(prevProvider) (and
then perform tp.Shutdown(cleanupCtx) as currently implemented), referencing
otel.SetTracerProvider, otel.GetTracerProvider, tp, and tp.Shutdown to locate
where to add the save/restore logic.

---

Duplicate comments:
In `@cmd/sentinel/main.go`:
- Line 34: The default OTel serviceName value currently set as serviceName =
"sentinel" should be changed to "hyperfleet-sentinel" to standardize tracing
identity; update the assignment of the serviceName variable in
cmd/sentinel/main.go (the serviceName symbol) to use "hyperfleet-sentinel" as
the default so traces are attributed consistently unless explicitly overridden.
- Around line 301-305: The shutdown logging currently uses the outer request
context variable ctx which may be canceled; change the log call to use the
telemetry shutdown context instead: when calling
telemetry.Shutdown(otelShutdownCtx, tp) and it returns an error, pass
otelShutdownCtx into log.Extra(...).Error(...) (keeping the existing
log.Extra("error", err) and message) so the failure is logged with the
shutdown-specific context created by context.WithTimeout
(otelShutdownCtx/otelShutdownCancel).

In `@pkg/telemetry/otel.go`:
- Line 93: Replace the direct TraceIDRatioBased sampler assignment so child
spans respect remote parent sampling: instead of assigning sampler =
trace.TraceIDRatioBased(samplingRate) wrap it with
trace.ParentBased(trace.TraceIDRatioBased(samplingRate)) (use the same
samplingRate variable) so parent-based sampling is preserved for incoming trace
contexts.

---

Nitpick comments:
In `@test/integration/integration_test.go`:
- Around line 902-909: The current check counts trace IDs from all captured
spans which can include unrelated roots and become flaky; change it to derive
continuity only from the poll traces by first building a set of expectedTraceIDs
from pollSpans (use each pollSpan.SpanContext.TraceID().String()) and then only
collect/count trace IDs from spans whose TraceID is in that expectedTraceIDs set
(i.e., filter spans by membership in the poll trace set) before asserting the
number of distinct poll-related traces matches len(pollSpans).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fb7281 and 2d3f335.

📒 Files selected for processing (10)

cmd/sentinel/main.go
configs/dev-example.yaml
configs/gcp-pubsub-example.yaml
configs/rabbitmq-example.yaml
internal/client/client_test.go
internal/config/config.go
internal/sentinel/sentinel.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/client/client_test.go

internal/config/config.go

internal/sentinel/sentinel.go

pkg/telemetry/otel_test.go

test/integration/integration_test.go

… tracing standardization

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

cmd/sentinel/main.go (1)

301-305: ⚠️ Potential issue | 🟡 Minor

Use otelShutdownCtx for the shutdown error log context.

Line 304 logs with ctx even though telemetry shutdown is executed under otelShutdownCtx.

💡 Proposed fix

 		if tp != nil {
 			otelShutdownCtx, otelShutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
 			defer otelShutdownCancel()
 			if err := telemetry.Shutdown(otelShutdownCtx, tp); err != nil {
-				log.Extra("error", err).Error(ctx, "Failed to shutdown OpenTelemetry")
+				log.Extra("error", err).Error(otelShutdownCtx, "Failed to shutdown OpenTelemetry")
 			}
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 301 - 305, The shutdown error log is using
the wrong context; change the log call to use otelShutdownCtx (created by
context.WithTimeout) instead of ctx so the telemetry.Shutdown error is logged
with the same context it runs under: locate otelShutdownCtx and
otelShutdownCancel and update the log.Extra(...).Error(...) invocation that
follows telemetry.Shutdown(otelShutdownCtx, tp) to pass otelShutdownCtx as the
context argument.

internal/sentinel/sentinel.go (1)

134-143: ⚠️ Potential issue | 🔴 Critical

End and mark evalSpan on SetData failure before continue.

On Line 142, continue skips Line 192, so evalSpan leaks on this error path and the span is not marked failed.

🐛 Proposed fix

 			if err := event.SetData(cloudevents.ApplicationJSON, map[string]interface{}{
 				"kind":       resource.Kind,
 				"id":         resource.ID,
 				"generation": resource.Generation,
 				"href":       resource.Href,
 				"reason":     decision.Reason,
 			}); err != nil {
+				evalSpan.RecordError(err)
+				evalSpan.SetStatus(codes.Error, "set event data failed")
 				s.logger.Errorf(eventCtx, "Failed to set event data resource_id=%s error=%v", resource.ID, err)
+				evalSpan.End()
 				continue
 			}

Also applies to: 192-192

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 134 - 143, When
event.SetData(...) fails in the block around event.SetData, ensure you mark and
end the `evalSpan` before returning/continuing: after logging the error in the
SetData error branch, record the error on `evalSpan` (e.g., RecordError /
SetStatus to error with the err message) and call `evalSpan.End()` prior to the
`continue` so the span is not leaked; apply the same fix to the other identical
error path near the later SetData call (the occurrence referenced at line ~192).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sentinel/main.go`:
- Around line 151-155: The code currently ignores parse errors from
strconv.ParseBool for TRACING_ENABLED; modify the block around
otelEnabled/strconv.ParseBool so that when err != nil you either log a clear
error and exit (fail fast) or at minimum emit a warning to stderr/logger
indicating the invalid value and that tracing will remain at its default, e.g.
using log.Printf or fmt.Fprintf(os.Stderr, ...), referencing the TRACING_ENABLED
env var, the strconv.ParseBool call, and cfg.OTel.Enabled to show
current/default state; ensure successful ParseBool still sets cfg.OTel.Enabled
as before.

In `@pkg/telemetry/otel.go`:
- Around line 77-80: The code silently ignores strconv.ParseFloat errors when
reading ENV_OTEL_TRACES_SAMPLER_ARG; update the parsing branch (the block
referencing ENV_OTEL_TRACES_SAMPLER_ARG and samplingRate) to detect a parse
error and handle it explicitly by logging a clear warning or returning an error
that includes the environment variable name and the invalid value, and only
assign samplingRate on successful parse; ensure the log/returned message
references ENV_OTEL_TRACES_SAMPLER_ARG and the raw arg so misconfiguration is
visible.

---

Duplicate comments:
In `@cmd/sentinel/main.go`:
- Around line 301-305: The shutdown error log is using the wrong context; change
the log call to use otelShutdownCtx (created by context.WithTimeout) instead of
ctx so the telemetry.Shutdown error is logged with the same context it runs
under: locate otelShutdownCtx and otelShutdownCancel and update the
log.Extra(...).Error(...) invocation that follows
telemetry.Shutdown(otelShutdownCtx, tp) to pass otelShutdownCtx as the context
argument.

In `@internal/sentinel/sentinel.go`:
- Around line 134-143: When event.SetData(...) fails in the block around
event.SetData, ensure you mark and end the `evalSpan` before
returning/continuing: after logging the error in the SetData error branch,
record the error on `evalSpan` (e.g., RecordError / SetStatus to error with the
err message) and call `evalSpan.End()` prior to the `continue` so the span is
not leaked; apply the same fix to the other identical error path near the later
SetData call (the occurrence referenced at line ~192).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3f335 and b3a9ae1.

📒 Files selected for processing (3)

cmd/sentinel/main.go
internal/sentinel/sentinel.go
pkg/telemetry/otel.go

cmd/sentinel/main.go

pkg/telemetry/otel.go

tirthct · 2026-02-26T15:35:28Z

/retest

…rect context

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

pkg/telemetry/otel.go (1)
87-96: ParentBased not applied for AlwaysSample and NeverSample cases.

When samplingRate >= 1.0 or <= 0.0, the sampler is set directly to AlwaysSample() or NeverSample() without wrapping in ParentBased(). This means parent sampling decisions are ignored in these cases, potentially breaking trace continuity across service boundaries.

For consistency with the architecture standard that specifies parentbased_traceidratio as the default strategy, consider wrapping all samplers:
♻️ Proposed fix
 	var sampler trace.Sampler
 	switch {
 	case samplingRate >= 1.0:
-		sampler = trace.AlwaysSample() // Sample all
+		sampler = trace.ParentBased(trace.AlwaysSample())
 	case samplingRate <= 0.0:
-		sampler = trace.NeverSample() // Sample none
+		sampler = trace.ParentBased(trace.NeverSample())
 	default:
 		sampler = trace.ParentBased(trace.TraceIDRatioBased(samplingRate))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel.go` around lines 87 - 96, The sampler logic sets sampler
directly to trace.AlwaysSample() or trace.NeverSample() for samplingRate >= 1.0
or <= 0.0, which bypasses parent-based behavior; change those branches to wrap
the base samplers with trace.ParentBased(...) so all cases—including
trace.AlwaysSample(), trace.NeverSample() and
trace.TraceIDRatioBased(samplingRate)—are applied via trace.ParentBased, keeping
the existing switch and the sampler variable but replacing direct calls with
trace.ParentBased(trace.AlwaysSample()), trace.ParentBased(trace.NeverSample()),
and trace.ParentBased(trace.TraceIDRatioBased(samplingRate)).
pkg/telemetry/otel_test.go (1)
181-185: Test assertion for expectedSample=false cases may be misleading.

For always_off (line 117-121), the test expects expectedSample: false but only checks IsValid() when expectedSample is true. The span context remains valid even when not sampled — NeverSample still produces a valid SpanContext, it just won't be exported.

To properly verify sampling behavior, check span.SpanContext().IsSampled() instead:
💡 Suggested enhancement
-			if tt.expectedSample {
-				if !span.SpanContext().IsValid() {
-					t.Error("Expected valid span context for sampling=true")
-				}
+			if span.SpanContext().IsSampled() != tt.expectedSample {
+				t.Errorf("Expected IsSampled()=%v, got %v", tt.expectedSample, span.SpanContext().IsSampled())
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 181 - 185, The test currently only
checks span.SpanContext().IsValid() when tt.expectedSample is true, which
doesn't verify sampling; update the assertion to use
span.SpanContext().IsSampled() (and assert true/false according to
tt.expectedSample) so cases like the "always_off" sampler correctly validate
sampling behavior; locate the table-driven test and replace the IsValid-based
conditional with an assertion on SpanContext().IsSampled() for the test case
keyed by tt.expectedSample in the test function in pkg/telemetry/otel_test.go.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 449-454: The deferred shutdown of the tracer provider uses
t.Fatalf which runs after the test body and can mislead test results; change the
deferred call in the shutdown closure (the anonymous func that calls
tp.Shutdown(ctx) on the trace.TracerProvider instance `tp` using `ctx`) to use
t.Errorf to report errors instead of t.Fatalf, so cleanup failures are reported
without aborting after test completion.
- Line 467: The test currently ignores the error returned by
NewHyperFleetClient; change the setup to capture the error (e.g.,
hyperfleetClient, err := client.NewHyperFleetClient(server.URL, 10*time.Second))
and fail fast on error (e.g., t.Fatalf or require.NoError(t, err)) so setup
failures are reported immediately instead of surfacing as secondary errors later
when hyperfleetClient is used.

In `@pkg/telemetry/otel.go`:
- Around line 77-83: The current logic checks ENV_OTEL_TRACES_SAMPLER_ARG and
logs "Invalid" in the else branch when the env var is empty, but it fails to
warn on strconv.ParseFloat errors; update the logic in the block that reads
ENV_OTEL_TRACES_SAMPLER_ARG so that: if arg == "" do nothing (keep default
samplingRate), if arg != "" attempt strconv.ParseFloat and on err != nil call
log.Warnf with a clear message including ENV_OTEL_TRACES_SAMPLER_ARG and the bad
arg value and that the default samplingRate will be used, and only set
samplingRate when ParseFloat succeeds; adjust references to
ENV_OTEL_TRACES_SAMPLER_ARG, arg, strconv.ParseFloat, samplingRate, and
log.Warnf accordingly.

---

Nitpick comments:
In `@pkg/telemetry/otel_test.go`:
- Around line 181-185: The test currently only checks
span.SpanContext().IsValid() when tt.expectedSample is true, which doesn't
verify sampling; update the assertion to use span.SpanContext().IsSampled() (and
assert true/false according to tt.expectedSample) so cases like the "always_off"
sampler correctly validate sampling behavior; locate the table-driven test and
replace the IsValid-based conditional with an assertion on
SpanContext().IsSampled() for the test case keyed by tt.expectedSample in the
test function in pkg/telemetry/otel_test.go.

In `@pkg/telemetry/otel.go`:
- Around line 87-96: The sampler logic sets sampler directly to
trace.AlwaysSample() or trace.NeverSample() for samplingRate >= 1.0 or <= 0.0,
which bypasses parent-based behavior; change those branches to wrap the base
samplers with trace.ParentBased(...) so all cases—including
trace.AlwaysSample(), trace.NeverSample() and
trace.TraceIDRatioBased(samplingRate)—are applied via trace.ParentBased, keeping
the existing switch and the sampler variable but replacing direct calls with
trace.ParentBased(trace.AlwaysSample()), trace.ParentBased(trace.NeverSample()),
and trace.ParentBased(trace.TraceIDRatioBased(samplingRate)).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3a9ae1 and 8c2c2bb.

📒 Files selected for processing (8)

cmd/sentinel/main.go
internal/client/client_test.go
internal/config/config.go
internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/client/client_test.go
internal/config/config.go

internal/sentinel/sentinel_test.go

pkg/telemetry/otel.go

rafabene · 2026-02-26T17:08:44Z

internal/sentinel/sentinel.go

+			attribute.String("hyperfleet.resource_id", resource.ID),
+		)

 		decision := s.decisionEngine.Evaluate(resource, now)


The sentinel.evaluate span is missing the hyperfleet.decision_reason attribute. The decision
is computed (s.decisionEngine.Evaluate(...)) but the result is never added to
evalSpan. Per the JIRA acceptance criteria and the architecture tracing standard,
hyperfleet.decision_reason is a required attribute on this span.

This is especially important for skipped resources — since no publish span is created in that
case, the decision reason would be completely absent from the trace.

Suggestion: add the attribute after the evaluation:

Suggested change

decision := s.decisionEngine.Evaluate(resource, now)

decision := s.decisionEngine.Evaluate(resource, now)

evalSpan.SetAttributes(attribute.String("hyperfleet.decision_reason", decision.Reason))

Added decision reason to evalSpan (thought we only need to add it to publish span)

…r check for new hyperfleet client

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/otel.go`:
- Around line 142-149: Add nil/zero guards in SetTraceContext: return early if
event is nil or span is nil/invalid before calling span.SpanContext(); check
span != nil (or equivalent) and use span.SpanContext().IsValid() only after
those checks to avoid panics from dereferencing nil spans or nil events, and
keep the existing traceparent formatting logic (traceParent := fmt.Sprintf(...)
and event.SetExtension("traceparent", traceParent)) inside the guarded block so
they only run when both inputs are valid.
- Around line 41-43: The current code calls
otlptracehttp.WithEndpointURL(otlpEndpoint) (inside otlptracehttp.New), which
treats the value as a full URL and prevents the SDK from appending the
/v1/traces path; remove that call and either let the SDK read
OTEL_EXPORTER_OTLP_ENDPOINT itself (omit the WithEndpointURL option) or, if
specifying programmatically, replace WithEndpointURL with
otlptracehttp.WithEndpoint(...) using a host:port value (and keep the call site
in otlptracehttp.New).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2c2bb and 31b40d6.

📒 Files selected for processing (3)

internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/telemetry/otel.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/sentinel/sentinel_test.go
internal/sentinel/sentinel.go

pkg/telemetry/otel.go

… check in SetTraceContext()

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/otel.go`:
- Around line 41-42: The exporter selection currently only checks
ENV_OTEL_EXPORTER_OTLP_ENDPOINT; update the gating logic to also consider
ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and prefer the traces-specific variable
when present so otlptracehttp.New is attempted for traces; modify the check
around otlptracehttp.New (references: ENV_OTEL_EXPORTER_OTLP_ENDPOINT,
ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, otlptracehttp.New, exporter, err) to
look for the traces endpoint first and fall back to the generic OTLP endpoint
before defaulting to stdout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31b40d6 and f30c9bf.

📒 Files selected for processing (1)

pkg/telemetry/otel.go

coderabbitai · 2026-02-26T19:13:31Z

pkg/telemetry/otel.go

+	if otlpEndpoint := os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT); otlpEndpoint != "" {
+		exporter, err = otlptracehttp.New(ctx)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/telemetry/otel.go | head -80

Repository: openshift-hyperfleet/hyperfleet-sentinel

Length of output: 3206

🌐 Web query:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT OpenTelemetry specification environment variable traces

💡 Result:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is the OpenTelemetry OTLP exporter environment variable that sets the trace (spans) export endpoint URL only (it overrides the generic OTEL_EXPORTER_OTLP_ENDPOINT for traces). [1]

Key spec behaviors:

Precedence: If OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, it takes precedence over OTEL_EXPORTER_OTLP_ENDPOINT for traces. [1]

OTLP/HTTP path handling: The value of OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is used as-is (the exporter must not auto-append /v1/traces). Exception: if the URL has no path, / must be used. [1]

In contrast, when only OTEL_EXPORTER_OTLP_ENDPOINT is set (OTLP/HTTP), the exporter constructs the traces URL by appending v1/traces to the base endpoint. [1]

Defaults (commonly documented):

OTLP/gRPC: http://localhost:4317

OTLP/HTTP: http://localhost:4318/v1/traces [2]

Sources:
[1] OpenTelemetry Specification – OTLP Exporter (Endpoint URLs and env vars) https://opentelemetry.io/docs/specs/otel/protocol/exporter/
[2] OpenTelemetry Docs – OTLP Exporter Configuration (defaults and examples) https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/

🌐 Web query:

OpenTelemetry Go otlptracehttp.New environment variables OTEL_EXPORTER_OTLP_ENDPOINT OTEL_EXPORTER_OTLP_TRACES_ENDPOINT

💡 Result:

For the Go OTLP/HTTP trace exporter (go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp.New), the two env vars behave like this:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT (per-signal) takes precedence over OTEL_EXPORTER_OTLP_ENDPOINT (base). If OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, it’s used for traces instead of the base endpoint. The Go exporter docs also note OTEL_EXPORTER_OTLP_TRACES_ENDPOINT overrides OTEL_EXPORTER_OTLP_ENDPOINT. [1][2][3]

Path handling differs:

If you set OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, the URL is used as-is (no automatic /v1/traces append). So for OTLP/HTTP you typically include the full trace path yourself (e.g. http://collector:4318/v1/traces). [1][2]

If you set only OTEL_EXPORTER_OTLP_ENDPOINT, it’s treated as a base URL, and the exporter constructs the trace URL by appending v1/traces (e.g. http://collector:4318 → http://collector:4318/v1/traces). [1][2][3]

Code options can override env: in Go, options like WithEndpoint, WithEndpointURL, WithInsecure can override env-based configuration. [3]

Examples

Single base endpoint (auto path):

OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318

traces → http://collector:4318/v1/traces [1][2]

Per-traces endpoint (no auto path):

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://collector:4318/v1/traces

traces → exactly that URL [1][2]

Sources
[1] OpenTelemetry OTLP/HTTP exporter spec (endpoint construction & precedence) (opentelemetry.io)
[2] OpenTelemetry “OTLP Exporter Configuration” (env var meanings/defaults) (opentelemetry.io)
[3] Go otlptracehttp package docs (env vars + override behavior) (pkg.go.dev)

Citations:

1: https://opentelemetry.io/docs/specs/otel/protocol/exporter/?utm_source=openai

2: https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/?utm_source=openai

3: https://pkg.go.dev/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp?utm_source=openai

Check for traces-specific OTLP endpoint when selecting exporter.

The current gating logic at line 41 only checks OTEL_EXPORTER_OTLP_ENDPOINT. Per the OpenTelemetry specification, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is a valid configuration that takes precedence for traces. If a user sets only this traces-specific endpoint (valid configuration), the condition fails and the code silently falls back to stdout, never instantiating the OTLP exporter where otlptracehttp.New() would properly handle it.

🔧 Proposed fix

const ( SAMPLER_ALWAYS_ON = "always_on" SAMPLER_ALWAYS_OFF = "always_off" SAMPLER_TRACE_ID_RATIO = "traceidratio" ENV_OTEL_TRACES_SAMPLER = "OTEL_TRACES_SAMPLER" ENV_OTEL_TRACES_SAMPLER_ARG = "OTEL_TRACES_SAMPLER_ARG" ENV_OTEL_EXPORTER_OTLP_ENDPOINT = "OTEL_EXPORTER_OTLP_ENDPOINT" + ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT = "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" ) @@ - if otlpEndpoint := os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT); otlpEndpoint != "" { + if os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT) != "" || + os.Getenv(ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) != "" { exporter, err = otlptracehttp.New(ctx)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if otlpEndpoint := os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT); otlpEndpoint != "" {

exporter, err = otlptracehttp.New(ctx)

const (

SAMPLER_ALWAYS_ON = "always_on"

SAMPLER_ALWAYS_OFF = "always_off"

SAMPLER_TRACE_ID_RATIO = "traceidratio"

ENV_OTEL_TRACES_SAMPLER = "OTEL_TRACES_SAMPLER"

ENV_OTEL_TRACES_SAMPLER_ARG = "OTEL_TRACES_SAMPLER_ARG"

ENV_OTEL_EXPORTER_OTLP_ENDPOINT = "OTEL_EXPORTER_OTLP_ENDPOINT"

ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT = "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"

)

if os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT) != "" ||

os.Getenv(ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) != "" {

exporter, err = otlptracehttp.New(ctx)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/telemetry/otel.go` around lines 41 - 42, The exporter selection currently only checks ENV_OTEL_EXPORTER_OTLP_ENDPOINT; update the gating logic to also consider ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and prefer the traces-specific variable when present so otlptracehttp.New is attempted for traces; modify the check around otlptracehttp.New (references: ENV_OTEL_EXPORTER_OTLP_ENDPOINT, ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, otlptracehttp.New, exporter, err) to look for the traces endpoint first and fall back to the generic OTLP endpoint before defaulting to stdout.

Hyperfleet-542 : Add OpenTelemetry instrumentation to Sentinel

dae968d

openshift-merge-robot added the needs-rebase label Feb 25, 2026

openshift-ci bot requested review from Mischulee and yasun1 February 25, 2026 00:49

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Hyperfleet-542 : Merged from master

3844006

openshift-merge-robot removed the needs-rebase label Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Hyperfleet-542 : Fix linter error

5fb7281

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

rafabene reviewed Feb 25, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

rafabene reviewed Feb 25, 2026

View reviewed changes

cmd/sentinel/main.go Outdated Show resolved Hide resolved

rafabene reviewed Feb 25, 2026

View reviewed changes

Hyperfleet-542 : Configure OTEL enablement, messaging system and fix …

2d3f335

…OTel traceparent function and tests

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Hyperfleet-542 : Fix sampler imeplementation and log error as per the…

b3a9ae1

… tracing standardization

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

cmd/sentinel/main.go Show resolved Hide resolved

pkg/telemetry/otel.go Show resolved Hide resolved

tirthct added 4 commits February 26, 2026 10:06

Hyperfleet-542 : Fix additional defer and empty string checks

7248e50

Merge branch 'main' into hyperfleet-542

37efa69

Hyperfleet-542 : Fix linter error

ff3b948

Hyperfleet-542 : Added error validation to sampling rate and pass cor…

8c2c2bb

…rect context

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

internal/sentinel/sentinel_test.go Show resolved Hide resolved

internal/sentinel/sentinel_test.go Outdated Show resolved Hide resolved

pkg/telemetry/otel.go Show resolved Hide resolved

rafabene reviewed Feb 26, 2026

View reviewed changes

Hyperfleet-542 : Added decision reason attribute to evalSpan and erro…

31b40d6

…r check for new hyperfleet client

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

pkg/telemetry/otel.go Show resolved Hide resolved

pkg/telemetry/otel.go Show resolved Hide resolved

Hyperfleet-542 : Removed WithEndpointURL() implementation and add nil…

f30c9bf

… check in SetTraceContext()

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

	decision := s.decisionEngine.Evaluate(resource, now)
	decision := s.decisionEngine.Evaluate(resource, now)
	evalSpan.SetAttributes(attribute.String("hyperfleet.decision_reason", decision.Reason))

		if otlpEndpoint := os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT); otlpEndpoint != "" {
		exporter, err = otlptracehttp.New(ctx)

-	if otlpEndpoint := os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT); otlpEndpoint != "" {
-		exporter, err = otlptracehttp.New(ctx)
+const (
+	SAMPLER_ALWAYS_ON                      = "always_on"
+	SAMPLER_ALWAYS_OFF                     = "always_off"
+	SAMPLER_TRACE_ID_RATIO                 = "traceidratio"
+	ENV_OTEL_TRACES_SAMPLER                = "OTEL_TRACES_SAMPLER"
+	ENV_OTEL_TRACES_SAMPLER_ARG            = "OTEL_TRACES_SAMPLER_ARG"
+	ENV_OTEL_EXPORTER_OTLP_ENDPOINT        = "OTEL_EXPORTER_OTLP_ENDPOINT"
+	ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT = "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
+)
+	if os.Getenv(ENV_OTEL_EXPORTER_OTLP_ENDPOINT) != "" ||
+		os.Getenv(ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) != "" {
+		exporter, err = otlptracehttp.New(ctx)

Conversation

tirthct commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tirthct Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

tirthct Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rafabene Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

tirthct Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tirthct commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tirthct commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

coderabbitai bot Feb 25, 2026 •

edited

Loading

tirthct Feb 26, 2026 •

edited

Loading