[Flagging] Merge evaluations into develop#3183
[Flagging] Merge evaluations into develop#3183gh-worker-dd-mergequeue-cf854d[bot] merged 98 commits intodevelopfrom
Conversation
[Flags] Evaluations subfeature Co-authored-by: typotter <tyler.potter@datadoghq.com>
[FFL-1720] Evaluation Logging: Event Schema & Data Models Co-authored-by: typotter <tyler.potter@datadoghq.com>
[Flags] FlagEvaluation schema Co-authored-by: typotter <tyler.potter@datadoghq.com>
Implements the core aggregation logic for evaluation logging (EVALLOG). Aggregates flag evaluations by key before flushing to reduce network overhead. Key components: - EvaluationEventsProcessor: Aggregates evaluations with time/size-based flushing - Time-based: Configurable interval (default 10s, range 1-60s) - Size-based: Auto-flush at 1000 unique aggregations - Shutdown: Final flush on processor.stop() - AggregationKey: Composite key for grouping evaluations - Groups by: flag, variant, allocation, targeting key, error code - EVALLOG.8: Omits variant/allocation for DEFAULT/ERROR reasons - AggregationStats: Tracks aggregated statistics per key - Count, first/last timestamps, last error message - Thread-safe with @volatile fields and synchronized blocks - EvaluationEventWriter: Interface for persisting FlagEvaluation events - Abstraction allows testing without storage implementation Test infrastructure: - FlagEvaluationAssert: Custom assertions for validation - FlagEvaluationForgeryFactory: Test data generator - EvaluationContextForgeryFactory: Context data generator Uses BatchedFlagEvaluations.FlagEvaluation from PR #1 schema. No runtime integration - isolated business logic only. EVALLOG compliance: 2, 3, 4, 5, 8, 10, 11, 13
- Replace destructuring with 4 entries to individual assignments - Use safe cast for nullable error message
- AggregationKeyTest: Tests aggregation key generation, equality, and grouping logic - AggregationStatsTest: Tests statistics tracking, error message updates, and thread safety - EvaluationEventsProcessorTest: Tests processor orchestration, flush triggers, and concurrency These tests cover: - Aggregation by error code with last message preservation - Thread-safe concurrent operations with high contention scenarios - All flush triggers (time, size, shutdown) - Field validation per evaluation logging spec
[FFL-1720] Evaluation Logging: Aggregation Engine & Test Utilities Co-authored-by: typotter <tyler.potter@datadoghq.com> Co-authored-by: 0xnm <nikita.ogorodnikov@datadoghq.com>
…r3-storage-network
[FFL-1720] Evaluation Logging: Integration Wires the evaluation logging feature end-to-end by connecting the EvaluationsFeature to the flag evaluation flow. This is the final PR that enables evaluation logging in the Flags SDK.
[FFL-1720] Evaluation Logging: Storage & Network Infrastructure
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
|
This comment has been minimized.
This comment has been minimized.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #3183 +/- ##
===========================================
+ Coverage 71.21% 71.28% +0.07%
===========================================
Files 922 929 +7
Lines 34173 34450 +277
Branches 5776 5817 +41
===========================================
+ Hits 24334 24557 +223
- Misses 8205 8255 +50
- Partials 1634 1638 +4
🚀 New features to boost your workflow:
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a90ef9333f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| // Log evaluation events for errors | ||
| trackErrorResolution(resolution) |
There was a problem hiding this comment.
Log evaluation errors in resolve() path
trackErrorResolution is only invoked from resolveTracked, but callers using resolve(flagKey, defaultValue) take a different error branch that returns createErrorResolution without emitting an evaluation event. As a result, missing/type-mismatched flags are excluded from evaluation telemetry whenever apps use the detailed resolve() API, which underreports error rates and skews aggregated evaluation metrics.
Useful? React with 👍 / 👎.
| ddContext = DDContext( | ||
| service = service, | ||
| rumApplicationId = context[RUM_APPLICATION_ID] as? String, | ||
| rumViewName = context[RUM_VIEW_NAME] as? String |
There was a problem hiding this comment.
Read RUM view URL instead of view name
This code captures view_name from RUM context and later serializes it into context.dd.rum.view.url, so evaluation events send the human-readable view name in the URL field. In apps where view names differ from URLs (templated routes, custom names), backend grouping/filtering by page URL will be inaccurate; this should use the RUM view_url context key.
Useful? React with 👍 / 👎.
What does this PR do?
#3145
Motivation
We have implemented Evaluation Logging in the Flagging module. This provides comprehensive visibility into all feature flag evaluations, including defaults, errors, and successful matches. This goes beyond exposure logging by capturing aggregated metrics about evaluation frequency, error rates, and runtime default usage across all flags.
What inspired you to submit this pull request?
Merge the feature branch into
developAdditional Notes
Thank you to all the reviewers along the way!
Anything else we should know when reviewing?
This PR contains the following PRs, and a merge from main to catch up and the deletion of the batched event schema.
🥞 Evaluation Logging Stacked Pull Requests 🥞
-#3147
Review checklist (to be filled by reviewers)