Skip to content

Adventure: πŸ§ͺ Blind by Design β€” πŸ”΄ Expert#44

Merged
KatharinaSick merged 8 commits into
off-on-dev:mainfrom
aepfli:adventure/blind-by-design-expert
May 19, 2026
Merged

Adventure: πŸ§ͺ Blind by Design β€” πŸ”΄ Expert#44
KatharinaSick merged 8 commits into
off-on-dev:mainfrom
aepfli:adventure/blind-by-design-expert

Conversation

@aepfli
Copy link
Copy Markdown
Contributor

@aepfli aepfli commented Apr 30, 2026

Summary

Phase 3 β€” read the chart: wire the OpenTelemetry meter provider, register the OpenFeature MetricsHook + TracesHook, author a custom ContextSpanHook that copies the merged evaluation context onto Tempo spans, then diagnose and roll back a misbehaving fractional rollout (vision_amplifier_v2) on the Grafana LGTM dashboard β€” no redeploy.

What this PR ships

  • adventures/planned/00-blind-by-design/docs/index.md β€” adds the πŸ”΄ Expert card to the Choose Your Level grid (completing the three-level set)
  • adventures/planned/00-blind-by-design/docs/expert.md β€” challenge doc
  • adventures/planned/00-blind-by-design/docs/solutions/expert.md β€” solution walkthrough (kept in repo, hidden from mkdocs.yaml nav until challenge launch)
  • adventures/planned/00-blind-by-design/expert/ β€” broken-state Spring Boot project + verify.sh + dashboards/feature-flags.json (Grafana) + loadgen/k6/script.js
  • .devcontainer/00-blind-by-design_03-expert/ β€” adds the Grafana LGTM container and the k6 loadgen alongside flagd

Stacking

This is PR 3 of 3 (last in the series), branched from adventure/blind-by-design-intermediate (#43). The diff currently includes Beginner's and Intermediate's content because this PR is stacked on top of them. Review #42 then #43 first β€” once they merge into main in order, the Expert-only delta will become apparent here automatically.

The original full-state PR #40 is kept open for reference and will be closed once this lands.

Closes #41

@aepfli aepfli requested a review from a team as a code owner April 30, 2026 09:24
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch 2 times, most recently from 4dc3293 to 2cb421e Compare April 30, 2026 13:17
aepfli added a commit to aepfli/open-ecosystem-challenges that referenced this pull request Apr 30, 2026
…only PR

Per @KatharinaSick's first instinct on review β€” and on reflection
the author agrees: when the placeholder stubs are reachable from
the side-nav they create more confusion than they resolve. The
stub files stay on disk so docs/index.md's links don't 404, but
they're not surfaced in navigation. Intermediate and Expert come
back into the nav in off-on-dev#43 and off-on-dev#44 once their content is real.

Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch from 2cb421e to 603eabd Compare April 30, 2026 13:24
aepfli added a commit to aepfli/open-ecosystem-challenges that referenced this pull request Apr 30, 2026
…inner-only PR

The two stub level-docs and the matching index.md cards were
carried in solely so docs/index.md links did not 404. With the
nav already trimmed (per @KatharinaSick on PR off-on-dev#42), and now with
the cards out of the landing page, the Beginner PR is genuinely
scoped to a single level. Intermediate and Expert each add their
own card + level doc as part of their respective PRs (off-on-dev#43 / off-on-dev#44).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch from 603eabd to 84fad08 Compare April 30, 2026 13:38
KatharinaSick pushed a commit to aepfli/open-ecosystem-challenges that referenced this pull request Apr 30, 2026
…only PR

Per @KatharinaSick's first instinct on review β€” and on reflection
the author agrees: when the placeholder stubs are reachable from
the side-nav they create more confusion than they resolve. The
stub files stay on disk so docs/index.md's links don't 404, but
they're not surfaced in navigation. Intermediate and Expert come
back into the nav in off-on-dev#43 and off-on-dev#44 once their content is real.

Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
KatharinaSick pushed a commit to aepfli/open-ecosystem-challenges that referenced this pull request Apr 30, 2026
…inner-only PR

The two stub level-docs and the matching index.md cards were
carried in solely so docs/index.md links did not 404. With the
nav already trimmed (per @KatharinaSick on PR off-on-dev#42), and now with
the cards out of the landing page, the Beginner PR is genuinely
scoped to a single level. Intermediate and Expert each add their
own card + level doc as part of their respective PRs (off-on-dev#43 / off-on-dev#44).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch 2 times, most recently from f2fd5ef to 48bfad1 Compare May 6, 2026 22:00
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch from 48bfad1 to ef152e5 Compare May 8, 2026 06:54
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch from b317259 to 58d96ae Compare May 18, 2026 13:37
aepfli and others added 7 commits May 18, 2026 16:34
Wire the OpenTelemetry meter provider, register the OpenFeature
MetricsHook + TracesHook, author a ContextSpanHook that copies the
merged evaluation context onto Tempo spans, then diagnose and roll
back a misbehaving fractional rollout (vision_amplifier_v2) on the
Grafana LGTM dashboard β€” no redeploy.

Replaces the placeholder expert.md stub with the full level doc,
ships the Expert solution walkthrough, broken-state code (including
the dashboard JSON and k6 loadgen), verify.sh, and devcontainer.

Stacked on top of off-on-dev#43 (🟑 Intermediate). Review off-on-dev#42 then off-on-dev#43 first.
This is the last PR in the series, so it closes the tracking issue.

Closes off-on-dev#41

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
- rename 'πŸ§ͺ The story (optional)' β†’ 'πŸͺ The Backstory'
- pin all docker images: flagd v0.15.4, otel-lgtm 0.26.0, k6 1.7.1
- devcontainer: drop flagd ports (8013/8014/8015/8016) from
  forwardPorts; the LGTM-stack ports (3000/9090/3200/4317/4318) and
  :8080 stay forwarded as before
- drop the published flagd ports from docker-compose β€” flagd reaches
  the lab on the docker-internal network as `flagd:8013`
- drop the 'Solution Walkthrough' section and the inline
  solutions/expert.md cross-link (solutions are unpublished
  pre-deadline)
- replace the verify-script blurb with the Adventure 03 template
- 'Access the UIs / flagd' subsection: explain flagd is internal-only
  now that the ports aren't forwarded
- verify.sh: lean on test_http_endpoint for the reachability check;
  point FLAGD_HTTP at flagd:8013 (docker network DNS) since the host
  no longer forwards :8013

Refs: PR off-on-dev#42 review by @KatharinaSick

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Mirror @KatharinaSick's Beginner pattern (605dabc): a thin Makefile
for discoverability + remove the solution doc since solutions are
not meant to be published before the challenge launch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Mirrors the Intermediate cleanup (8bcf885) on Expert, plus picks up the
targetingKey + PII discipline that was deferred from Intermediate.

- Rewrite Objective as 5 outcome-based bullets β€” drop the
  mechanism-heavy list, drop the parenthetical "verified: SpeciesInterceptor
  wires userId" note (now redundant: targetingKey lives at the
  implementation level here, not the objective level).
- Drop the "Concepts you'll touch" section. Its load-bearing content
  migrates inline to the per-step instructions: TracerProvider vs
  MeterProvider gloss into 4a, TracesHook/MetricsHook gloss into 4b,
  the ContextSpanHook authoring guide into a new 4c, and the
  fractional + targetingKey explanation into 4d.
- Add explicit step 4c "Author and register your own ContextSpanHook"
  β€” the ContextSpanHook was an objective bullet with no corresponding
  implementation step (analogous to the missing dose-passing step on
  Intermediate).
- Move the PII allowlist callout to step 4c β€” Expert is where it earns
  its place, since eval context is about to flow onto OTel spans that
  ship to SIEM-grade backends. The Intermediate cross-link goes away;
  the discipline lives here standalone.
- Lift "Start the Lab" out of step 1 into its own step 2 β€” mirrors the
  Intermediate / Beginner shape, so a player who clicks the Ports tab
  before reading further doesn't see a 502.
- Format Deadline + Community thread sections as Coming Soon callouts,
  matching Intermediate.
- Sync verify.sh OBJECTIVE block to the new outcome-based docs.

Addresses Katharina's review themes carried over from off-on-dev#43 (objective
shape, Concepts vs Learn overlap, verifier exercises objective).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
The devcontainer's workspaceFolder already opens at the expert
directory, so `cd adventures/planned/00-blind-by-design/expert` is a
no-op for any user on the intended path. Carries the same cleanup
Katharina applied to Intermediate in a26ad06.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
- Drop "Phase 3 β€” " from the level title across all 8 sites
  (doc title, devcontainer name, verify.sh banner + summary, Makefile,
  post-start.sh banner, "Select Codespace config" line in step 1).
  Katharina suggested calling it just "Read the chart" β€” keeps the
  story flavor without the artificial sub-numbering.
- Trim the doc intro from three paragraphs to one. The "Three sub-tasks"
  enumeration and the "Level passes when (a)/(b)/(c)/(d)" recap were
  duplicating the Objective section; the "Spans flowing / metrics dead"
  paragraph is the one that earns its place as the lead.
- Rewrite 4a-4d to be directional instead of tutorial. Expert level
  shouldn't dictate every keystroke; point at the gap and the outcome,
  leave the keystrokes to the player.
  - 4a: drop the explicit `otel.metrics.exporter=otlp` instruction +
    the batch-interval recipe. Name the two files where the
    autoconfig defaults live; hint that the default batch interval
    will be a pain for live debugging.
  - 4b: drop the literal `api.addHooks(new MetricsHook(openTelemetry));`
    line. Name the gap (TracesHook registered, MetricsHook isn't) and
    the next step.
  - 4c: keep the pseudocode shape (directional already) but trim
    `Span.current()` mechanic and the "Search β†’ Service β†’" UI
    walkthrough; replace with "verifier searches Tempo for
    feature_flag.context.dose=underdose" as the smoke signal.
  - 4d: drop the inverted-fractional spoiler. Tell the player that
    the dashboard's variant-distribution panel surfaces the offender;
    the rollback itself is theirs to do.
- Add a "Helpful Documentation" sub-section at the end of "How to Play"
  (matching the pattern Adventure 03 Expert uses). Five external
  references: OpenFeature OTel hooks, OTel SDK autoconfigure, OpenFeature
  Hooks concept, flagd fractional, OTel security guidance.

Addresses Katharina's review comments on #1
(intro paragraphs, "this is now very much of a tutorial again β€” applies
to all steps", and the helpful-documentation suggestion).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
…fixes

Caught two real broken-state bugs while doing a fresh-codespace run of
the Expert level:

1. The trace pipeline never worked. pom.xml imported the OTel
   instrumentation BOM but did not pull in any instrumentation artifact,
   so Spring WebMVC never produced server spans. TracesHook and
   ContextSpanHook both attach to Span.current() β€” without an active
   span, every setAttribute call silently disappears, Tempo stays empty,
   and the doc lead ("spans flowing into Tempo from TracesHook") was
   not actually true.

2. The verifier could not reach the LGTM stack (Grafana/Prometheus/Tempo).
   It used http://localhost:NNNN URLs, but verify.sh runs inside the
   workspace container, where localhost points to the workspace itself.
   The LGTM service is a sibling compose service, reachable only by
   service name on the docker-internal network.

3. The vision_amplifier_v2 rollback check was buggy: `jq -r '.value // empty'`
   treats jq-false as missing because `//` is the alternative operator.
   So a successfully rolled-back flag (.value=false) was printed as ''
   and the check failed. Worked by accident in the broken state where
   the flag resolved to true.

Trace-pipeline fix uses the OpenTelemetry Java Agent rather than a
Spring Boot starter β€” the starter at the BOM-pinned 2.14.0 does not
support Spring Boot 4 (NoClassDefFoundError on RestClientCustomizer);
bumping the BOM is doable but the agent keeps the level focused on
OpenFeature hooks rather than OTel SDK plumbing. Concrete changes:

- pom.xml: drop opentelemetry-sdk, opentelemetry-exporter-otlp,
  opentelemetry-sdk-extension-autoconfigure (agent provides all three).
  Keep opentelemetry-api for the Hook type signatures. Add
  spring-boot-maven-plugin <jvmArguments>-javaagent:${OTEL_JAVAAGENT_JAR}</jvmArguments>
  so only the forked lab JVM is agent-attached, not Maven itself.

- Delete OpenTelemetryConfig.java entirely β€” the agent registers the
  global SDK before main() runs and AutoConfiguredOpenTelemetrySdk
  .setResultAsGlobal() would just collide with it.

- OpenFeatureConfig.java: docstring + TODO comments reflect the new
  GlobalOpenTelemetry.get() pattern; players fetch the agent-installed
  OTel handle for MetricsHook rather than constructor-injecting it from
  a bean that no longer exists.

- New otel.properties next to pom.xml: what the player edits to flip
  the metrics exporter. Pointed at by OTEL_JAVAAGENT_CONFIGURATION_FILE
  in docker-compose.yml. Same lesson as before (turn on metrics), new
  mechanic.

- application.properties: strip all otel.* lines + add a comment
  explaining the agent does not read Spring's Environment.

- docker-compose.yml: set OTEL_JAVAAGENT_JAR + OTEL_JAVAAGENT_CONFIGURATION_FILE
  in the workspace env; drop the manual OTEL_* vars that
  OpenTelemetryConfig used to bridge.

- post-create.sh: download opentelemetry-javaagent.jar v2.27.0 into
  $REPO_ROOT/tools/, idempotent on re-run.

- .vscode/launch.json: add vmArgs so F5/Spring Boot Dashboard launches
  also get the agent. Also rename to "Run the Lab" (Phase 3 was
  dropped from level title earlier).

- devcontainer.json + post-start.sh openFiles: point at otel.properties
  instead of the deleted OpenTelemetryConfig.java.

- .gitignore: add target/ + tools/ so the agent jar and Maven build
  output stop showing as untracked.

- docs/expert.md step 4a + Helpful Documentation: reframe around
  editing otel.properties and link the agent config reference.

- verify.sh: PROMETHEUS_URL / TEMPO_URL / GRAFANA_URL use the lgtm
  service name. AuditHook hint references the literal '[AUDIT]'
  format. jq fix for the rollback check (.value instead of
  .value // empty). Hints clarified to mention service-name vs
  localhost reachability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
@aepfli aepfli force-pushed the adventure/blind-by-design-expert branch from cdec139 to e2acdbf Compare May 18, 2026 14:34
Grafana never showed the Fun With Flags dashboard because the
docker-compose mount was wrong:

- Mounted to /otel-lgtm/grafana/dashboards (Grafana's legacy default),
  which otel-lgtm does not scan.
- No provisioning YAML pointing at the dashboard directory at all, so
  even if the path had been right, Grafana wouldn't have known to
  load anything from it.

otel-lgtm 0.26.0 reads dashboard providers from
/otel-lgtm/grafana/conf/provisioning/dashboards/*.yaml and loads
dashboards from whatever path each provider references. Add the
provisioning YAML next to the dashboard JSON, mount both to the right
paths.

Existing players need to either Rebuild Container in their Codespace
or `docker compose up -d --force-recreate lgtm` β€” volume mount changes
do not apply hot.

Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Copy link
Copy Markdown
Contributor

@KatharinaSick KatharinaSick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing challenge! Thank you!! 😊 This level was especially fun as all the magic somehow connected :D


The trial just went wide. Phase 3 of the new vision amplifier β€” `vision_amplifier_v2` β€” was approved for the full cohort yesterday morning. The promise was straightforward: subjects emerge with sharper eyesight than they walked in with. By mid-afternoon the audit log was screaming. Subjects were stabilising 200ms slower, and roughly one in ten of them was emerging **blind** β€” containment failure recorded as an HTTP 500. The lab director pulled up the **Feature Flag Metrics** dashboard expecting to triage visually. The dashboard was dark. Someone had wired up traces but never finished the metrics half. There is no chart to read. The lab is studying eyesight and the lab itself cannot see.

Your job, in order: **turn on the lights**, find the bad arm of the trial, and **halt enrolment** on the amplifier β€” all without redeploying the lab. That last constraint is the whole point of feature flags: when a rollout starts misbehaving in production, you need an operational lever that does not take twenty minutes to pull. Save the file, watch the dose drop, watch the 5xx rate fall back to baseline, watch the next batch of subjects walk out seeing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how you added a learning value to the story here 😊

@KatharinaSick KatharinaSick merged commit f6e9b89 into off-on-dev:main May 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adventure: πŸ§ͺ Blind by Design (OpenFeature + flagd) [tracking]

2 participants