Skip to content

[Flagging] Merge evaluations into develop#3183

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 98 commits intodevelopfrom
typo/merge-evaluations-into-develop
Feb 17, 2026
Merged

[Flagging] Merge evaluations into develop#3183
gh-worker-dd-mergequeue-cf854d[bot] merged 98 commits intodevelopfrom
typo/merge-evaluations-into-develop

Conversation

@typotter
Copy link
Copy Markdown
Contributor

@typotter typotter commented Feb 17, 2026

What does this PR do?

#3145

Motivation

We have implemented Evaluation Logging in the Flagging module. This provides comprehensive visibility into all feature flag evaluations, including defaults, errors, and successful matches. This goes beyond exposure logging by capturing aggregated metrics about evaluation frequency, error rates, and runtime default usage across all flags.

What inspired you to submit this pull request?

Merge the feature branch into develop

Additional Notes

Thank you to all the reviewers along the way!

Anything else we should know when reviewing?

This PR contains the following PRs, and a merge from main to catch up and the deletion of the batched event schema.

🥞 Evaluation Logging Stacked Pull Requests 🥞

-#3147

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

typotter and others added 30 commits January 26, 2026 23:31
[Flags] Evaluations subfeature

Co-authored-by: typotter <tyler.potter@datadoghq.com>
[FFL-1720] Evaluation Logging: Event Schema & Data Models

Co-authored-by: typotter <tyler.potter@datadoghq.com>
[Flags] FlagEvaluation schema

Co-authored-by: typotter <tyler.potter@datadoghq.com>
Implements the core aggregation logic for evaluation logging (EVALLOG).
Aggregates flag evaluations by key before flushing to reduce network overhead.

Key components:
- EvaluationEventsProcessor: Aggregates evaluations with time/size-based flushing
  - Time-based: Configurable interval (default 10s, range 1-60s)
  - Size-based: Auto-flush at 1000 unique aggregations
  - Shutdown: Final flush on processor.stop()

- AggregationKey: Composite key for grouping evaluations
  - Groups by: flag, variant, allocation, targeting key, error code
  - EVALLOG.8: Omits variant/allocation for DEFAULT/ERROR reasons

- AggregationStats: Tracks aggregated statistics per key
  - Count, first/last timestamps, last error message
  - Thread-safe with @volatile fields and synchronized blocks

- EvaluationEventWriter: Interface for persisting FlagEvaluation events
  - Abstraction allows testing without storage implementation

Test infrastructure:
- FlagEvaluationAssert: Custom assertions for validation
- FlagEvaluationForgeryFactory: Test data generator
- EvaluationContextForgeryFactory: Context data generator

Uses BatchedFlagEvaluations.FlagEvaluation from PR #1 schema.
No runtime integration - isolated business logic only.

EVALLOG compliance: 2, 3, 4, 5, 8, 10, 11, 13
- Replace destructuring with 4 entries to individual assignments
- Use safe cast for nullable error message
- AggregationKeyTest: Tests aggregation key generation, equality, and grouping logic
- AggregationStatsTest: Tests statistics tracking, error message updates, and thread safety
- EvaluationEventsProcessorTest: Tests processor orchestration, flush triggers, and concurrency

These tests cover:
- Aggregation by error code with last message preservation
- Thread-safe concurrent operations with high contention scenarios
- All flush triggers (time, size, shutdown)
- Field validation per evaluation logging spec
typotter and others added 10 commits February 13, 2026 08:10
[FFL-1720] Evaluation Logging: Aggregation Engine & Test Utilities

Co-authored-by: typotter <tyler.potter@datadoghq.com>
Co-authored-by: 0xnm <nikita.ogorodnikov@datadoghq.com>
[FFL-1720] Evaluation Logging: Integration

Wires the evaluation logging feature end-to-end by connecting the EvaluationsFeature to the flag evaluation flow. This is the final PR that enables evaluation logging in the Flags SDK.
[FFL-1720] Evaluation Logging: Storage & Network Infrastructure
@typotter typotter requested a review from a team as a code owner February 17, 2026 13:12
@typotter typotter mentioned this pull request Feb 17, 2026
3 tasks
@typotter typotter requested a review from 0xnm February 17, 2026 13:14
@typotter
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented Feb 17, 2026

View all feedbacks in Devflow UI.

2026-02-17 13:23:32 UTC ℹ️ Start processing command /merge


2026-02-17 13:23:37 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-02-17 14:18:11 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in develop is approximately 1h (p90).


2026-02-17 15:29:15 UTC ℹ️ MergeQueue: This merge request was merged

@datadog-datadog-prod-us1

This comment has been minimized.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 78.81944% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.28%. Comparing base (4711ad8) to head (68b597a).
⚠️ Report is 279 commits behind head on develop.

Files with missing lines Patch % Lines
...tadog/android/flags/internal/EvaluationsFeature.kt 18.87% 43 Missing ⚠️
...tadog/android/flags/internal/DatadogFlagsClient.kt 67.86% 6 Missing and 3 partials ⚠️
...in/kotlin/com/datadog/android/flags/FlagsClient.kt 33.33% 3 Missing and 1 partial ⚠️
...ndroid/flags/internal/EvaluationEventsProcessor.kt 95.16% 2 Missing and 1 partial ⚠️
...in/com/datadog/android/flags/FlagsConfiguration.kt 84.62% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3183      +/-   ##
===========================================
+ Coverage    71.21%   71.28%   +0.07%     
===========================================
  Files          922      929       +7     
  Lines        34173    34450     +277     
  Branches      5776     5817      +41     
===========================================
+ Hits         24334    24557     +223     
- Misses        8205     8255      +50     
- Partials      1634     1638       +4     
Files with missing lines Coverage Δ
.../kotlin/com/datadog/android/api/feature/Feature.kt 100.00% <ø> (ø)
...src/main/kotlin/com/datadog/android/flags/Flags.kt 90.91% <100.00%> (+7.58%) ⬆️
.../android/flags/internal/ExposureEventsProcessor.kt 100.00% <100.00%> (ø)
...com/datadog/android/flags/internal/FlagsFeature.kt 85.07% <100.00%> (+1.49%) ⬆️
...droid/flags/internal/aggregation/AggregationKey.kt 100.00% <100.00%> (ø)
...oid/flags/internal/aggregation/AggregationStats.kt 100.00% <100.00%> (ø)
...flags/internal/aggregation/EvaluationAggregator.kt 100.00% <100.00%> (ø)
...id/flags/internal/net/EvaluationsRequestFactory.kt 100.00% <100.00%> (ø)
...gs/internal/storage/EvaluationEventRecordWriter.kt 100.00% <100.00%> (ø)
...in/com/datadog/android/flags/FlagsConfiguration.kt 89.74% <84.62%> (-2.85%) ⬇️
... and 4 more

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@plousada
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a90ef9333f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

// Log evaluation events for errors
trackErrorResolution(resolution)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Log evaluation errors in resolve() path

trackErrorResolution is only invoked from resolveTracked, but callers using resolve(flagKey, defaultValue) take a different error branch that returns createErrorResolution without emitting an evaluation event. As a result, missing/type-mismatched flags are excluded from evaluation telemetry whenever apps use the detailed resolve() API, which underreports error rates and skews aggregated evaluation metrics.

Useful? React with 👍 / 👎.

ddContext = DDContext(
service = service,
rumApplicationId = context[RUM_APPLICATION_ID] as? String,
rumViewName = context[RUM_VIEW_NAME] as? String
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read RUM view URL instead of view name

This code captures view_name from RUM context and later serializes it into context.dd.rum.view.url, so evaluation events send the human-readable view name in the URL field. In apps where view names differ from URLs (templated routes, custom names), backend grouping/filtering by page URL will be inaccurate; this should use the RUM view_url context key.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants