Adventure: π§ͺ Blind by Design β π΄ Expert#44
Merged
KatharinaSick merged 8 commits intoMay 19, 2026
Conversation
4dc3293 to
2cb421e
Compare
aepfli
added a commit
to aepfli/open-ecosystem-challenges
that referenced
this pull request
Apr 30, 2026
β¦only PR Per @KatharinaSick's first instinct on review β and on reflection the author agrees: when the placeholder stubs are reachable from the side-nav they create more confusion than they resolve. The stub files stay on disk so docs/index.md's links don't 404, but they're not surfaced in navigation. Intermediate and Expert come back into the nav in off-on-dev#43 and off-on-dev#44 once their content is real. Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
2cb421e to
603eabd
Compare
aepfli
added a commit
to aepfli/open-ecosystem-challenges
that referenced
this pull request
Apr 30, 2026
β¦inner-only PR The two stub level-docs and the matching index.md cards were carried in solely so docs/index.md links did not 404. With the nav already trimmed (per @KatharinaSick on PR off-on-dev#42), and now with the cards out of the landing page, the Beginner PR is genuinely scoped to a single level. Intermediate and Expert each add their own card + level doc as part of their respective PRs (off-on-dev#43 / off-on-dev#44). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
603eabd to
84fad08
Compare
KatharinaSick
pushed a commit
to aepfli/open-ecosystem-challenges
that referenced
this pull request
Apr 30, 2026
β¦only PR Per @KatharinaSick's first instinct on review β and on reflection the author agrees: when the placeholder stubs are reachable from the side-nav they create more confusion than they resolve. The stub files stay on disk so docs/index.md's links don't 404, but they're not surfaced in navigation. Intermediate and Expert come back into the nav in off-on-dev#43 and off-on-dev#44 once their content is real. Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
KatharinaSick
pushed a commit
to aepfli/open-ecosystem-challenges
that referenced
this pull request
Apr 30, 2026
β¦inner-only PR The two stub level-docs and the matching index.md cards were carried in solely so docs/index.md links did not 404. With the nav already trimmed (per @KatharinaSick on PR off-on-dev#42), and now with the cards out of the landing page, the Beginner PR is genuinely scoped to a single level. Intermediate and Expert each add their own card + level doc as part of their respective PRs (off-on-dev#43 / off-on-dev#44). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
f2fd5ef to
48bfad1
Compare
48bfad1 to
ef152e5
Compare
b317259 to
58d96ae
Compare
Wire the OpenTelemetry meter provider, register the OpenFeature MetricsHook + TracesHook, author a ContextSpanHook that copies the merged evaluation context onto Tempo spans, then diagnose and roll back a misbehaving fractional rollout (vision_amplifier_v2) on the Grafana LGTM dashboard β no redeploy. Replaces the placeholder expert.md stub with the full level doc, ships the Expert solution walkthrough, broken-state code (including the dashboard JSON and k6 loadgen), verify.sh, and devcontainer. Stacked on top of off-on-dev#43 (π‘ Intermediate). Review off-on-dev#42 then off-on-dev#43 first. This is the last PR in the series, so it closes the tracking issue. Closes off-on-dev#41 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
- rename 'π§ͺ The story (optional)' β 'πͺ The Backstory' - pin all docker images: flagd v0.15.4, otel-lgtm 0.26.0, k6 1.7.1 - devcontainer: drop flagd ports (8013/8014/8015/8016) from forwardPorts; the LGTM-stack ports (3000/9090/3200/4317/4318) and :8080 stay forwarded as before - drop the published flagd ports from docker-compose β flagd reaches the lab on the docker-internal network as `flagd:8013` - drop the 'Solution Walkthrough' section and the inline solutions/expert.md cross-link (solutions are unpublished pre-deadline) - replace the verify-script blurb with the Adventure 03 template - 'Access the UIs / flagd' subsection: explain flagd is internal-only now that the ports aren't forwarded - verify.sh: lean on test_http_endpoint for the reachability check; point FLAGD_HTTP at flagd:8013 (docker network DNS) since the host no longer forwards :8013 Refs: PR off-on-dev#42 review by @KatharinaSick Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Mirror @KatharinaSick's Beginner pattern (605dabc): a thin Makefile for discoverability + remove the solution doc since solutions are not meant to be published before the challenge launch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
Mirrors the Intermediate cleanup (8bcf885) on Expert, plus picks up the targetingKey + PII discipline that was deferred from Intermediate. - Rewrite Objective as 5 outcome-based bullets β drop the mechanism-heavy list, drop the parenthetical "verified: SpeciesInterceptor wires userId" note (now redundant: targetingKey lives at the implementation level here, not the objective level). - Drop the "Concepts you'll touch" section. Its load-bearing content migrates inline to the per-step instructions: TracerProvider vs MeterProvider gloss into 4a, TracesHook/MetricsHook gloss into 4b, the ContextSpanHook authoring guide into a new 4c, and the fractional + targetingKey explanation into 4d. - Add explicit step 4c "Author and register your own ContextSpanHook" β the ContextSpanHook was an objective bullet with no corresponding implementation step (analogous to the missing dose-passing step on Intermediate). - Move the PII allowlist callout to step 4c β Expert is where it earns its place, since eval context is about to flow onto OTel spans that ship to SIEM-grade backends. The Intermediate cross-link goes away; the discipline lives here standalone. - Lift "Start the Lab" out of step 1 into its own step 2 β mirrors the Intermediate / Beginner shape, so a player who clicks the Ports tab before reading further doesn't see a 502. - Format Deadline + Community thread sections as Coming Soon callouts, matching Intermediate. - Sync verify.sh OBJECTIVE block to the new outcome-based docs. Addresses Katharina's review themes carried over from off-on-dev#43 (objective shape, Concepts vs Learn overlap, verifier exercises objective). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
The devcontainer's workspaceFolder already opens at the expert directory, so `cd adventures/planned/00-blind-by-design/expert` is a no-op for any user on the intended path. Carries the same cleanup Katharina applied to Intermediate in a26ad06. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
- Drop "Phase 3 β " from the level title across all 8 sites
(doc title, devcontainer name, verify.sh banner + summary, Makefile,
post-start.sh banner, "Select Codespace config" line in step 1).
Katharina suggested calling it just "Read the chart" β keeps the
story flavor without the artificial sub-numbering.
- Trim the doc intro from three paragraphs to one. The "Three sub-tasks"
enumeration and the "Level passes when (a)/(b)/(c)/(d)" recap were
duplicating the Objective section; the "Spans flowing / metrics dead"
paragraph is the one that earns its place as the lead.
- Rewrite 4a-4d to be directional instead of tutorial. Expert level
shouldn't dictate every keystroke; point at the gap and the outcome,
leave the keystrokes to the player.
- 4a: drop the explicit `otel.metrics.exporter=otlp` instruction +
the batch-interval recipe. Name the two files where the
autoconfig defaults live; hint that the default batch interval
will be a pain for live debugging.
- 4b: drop the literal `api.addHooks(new MetricsHook(openTelemetry));`
line. Name the gap (TracesHook registered, MetricsHook isn't) and
the next step.
- 4c: keep the pseudocode shape (directional already) but trim
`Span.current()` mechanic and the "Search β Service β" UI
walkthrough; replace with "verifier searches Tempo for
feature_flag.context.dose=underdose" as the smoke signal.
- 4d: drop the inverted-fractional spoiler. Tell the player that
the dashboard's variant-distribution panel surfaces the offender;
the rollback itself is theirs to do.
- Add a "Helpful Documentation" sub-section at the end of "How to Play"
(matching the pattern Adventure 03 Expert uses). Five external
references: OpenFeature OTel hooks, OTel SDK autoconfigure, OpenFeature
Hooks concept, flagd fractional, OTel security guidance.
Addresses Katharina's review comments on #1
(intro paragraphs, "this is now very much of a tutorial again β applies
to all steps", and the helpful-documentation suggestion).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
β¦fixes
Caught two real broken-state bugs while doing a fresh-codespace run of
the Expert level:
1. The trace pipeline never worked. pom.xml imported the OTel
instrumentation BOM but did not pull in any instrumentation artifact,
so Spring WebMVC never produced server spans. TracesHook and
ContextSpanHook both attach to Span.current() β without an active
span, every setAttribute call silently disappears, Tempo stays empty,
and the doc lead ("spans flowing into Tempo from TracesHook") was
not actually true.
2. The verifier could not reach the LGTM stack (Grafana/Prometheus/Tempo).
It used http://localhost:NNNN URLs, but verify.sh runs inside the
workspace container, where localhost points to the workspace itself.
The LGTM service is a sibling compose service, reachable only by
service name on the docker-internal network.
3. The vision_amplifier_v2 rollback check was buggy: `jq -r '.value // empty'`
treats jq-false as missing because `//` is the alternative operator.
So a successfully rolled-back flag (.value=false) was printed as ''
and the check failed. Worked by accident in the broken state where
the flag resolved to true.
Trace-pipeline fix uses the OpenTelemetry Java Agent rather than a
Spring Boot starter β the starter at the BOM-pinned 2.14.0 does not
support Spring Boot 4 (NoClassDefFoundError on RestClientCustomizer);
bumping the BOM is doable but the agent keeps the level focused on
OpenFeature hooks rather than OTel SDK plumbing. Concrete changes:
- pom.xml: drop opentelemetry-sdk, opentelemetry-exporter-otlp,
opentelemetry-sdk-extension-autoconfigure (agent provides all three).
Keep opentelemetry-api for the Hook type signatures. Add
spring-boot-maven-plugin <jvmArguments>-javaagent:${OTEL_JAVAAGENT_JAR}</jvmArguments>
so only the forked lab JVM is agent-attached, not Maven itself.
- Delete OpenTelemetryConfig.java entirely β the agent registers the
global SDK before main() runs and AutoConfiguredOpenTelemetrySdk
.setResultAsGlobal() would just collide with it.
- OpenFeatureConfig.java: docstring + TODO comments reflect the new
GlobalOpenTelemetry.get() pattern; players fetch the agent-installed
OTel handle for MetricsHook rather than constructor-injecting it from
a bean that no longer exists.
- New otel.properties next to pom.xml: what the player edits to flip
the metrics exporter. Pointed at by OTEL_JAVAAGENT_CONFIGURATION_FILE
in docker-compose.yml. Same lesson as before (turn on metrics), new
mechanic.
- application.properties: strip all otel.* lines + add a comment
explaining the agent does not read Spring's Environment.
- docker-compose.yml: set OTEL_JAVAAGENT_JAR + OTEL_JAVAAGENT_CONFIGURATION_FILE
in the workspace env; drop the manual OTEL_* vars that
OpenTelemetryConfig used to bridge.
- post-create.sh: download opentelemetry-javaagent.jar v2.27.0 into
$REPO_ROOT/tools/, idempotent on re-run.
- .vscode/launch.json: add vmArgs so F5/Spring Boot Dashboard launches
also get the agent. Also rename to "Run the Lab" (Phase 3 was
dropped from level title earlier).
- devcontainer.json + post-start.sh openFiles: point at otel.properties
instead of the deleted OpenTelemetryConfig.java.
- .gitignore: add target/ + tools/ so the agent jar and Maven build
output stop showing as untracked.
- docs/expert.md step 4a + Helpful Documentation: reframe around
editing otel.properties and link the agent config reference.
- verify.sh: PROMETHEUS_URL / TEMPO_URL / GRAFANA_URL use the lgtm
service name. AuditHook hint references the literal '[AUDIT]'
format. jq fix for the rollback check (.value instead of
.value // empty). Hints clarified to mention service-name vs
localhost reachability.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
cdec139 to
e2acdbf
Compare
Grafana never showed the Fun With Flags dashboard because the docker-compose mount was wrong: - Mounted to /otel-lgtm/grafana/dashboards (Grafana's legacy default), which otel-lgtm does not scan. - No provisioning YAML pointing at the dashboard directory at all, so even if the path had been right, Grafana wouldn't have known to load anything from it. otel-lgtm 0.26.0 reads dashboard providers from /otel-lgtm/grafana/conf/provisioning/dashboards/*.yaml and loads dashboards from whatever path each provider references. Add the provisioning YAML next to the dashboard JSON, mount both to the right paths. Existing players need to either Rebuild Container in their Codespace or `docker compose up -d --force-recreate lgtm` β volume mount changes do not apply hot. Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>
KatharinaSick
approved these changes
May 18, 2026
Contributor
KatharinaSick
left a comment
There was a problem hiding this comment.
Amazing challenge! Thank you!! π This level was especially fun as all the magic somehow connected :D
|
|
||
| The trial just went wide. Phase 3 of the new vision amplifier β `vision_amplifier_v2` β was approved for the full cohort yesterday morning. The promise was straightforward: subjects emerge with sharper eyesight than they walked in with. By mid-afternoon the audit log was screaming. Subjects were stabilising 200ms slower, and roughly one in ten of them was emerging **blind** β containment failure recorded as an HTTP 500. The lab director pulled up the **Feature Flag Metrics** dashboard expecting to triage visually. The dashboard was dark. Someone had wired up traces but never finished the metrics half. There is no chart to read. The lab is studying eyesight and the lab itself cannot see. | ||
|
|
||
| Your job, in order: **turn on the lights**, find the bad arm of the trial, and **halt enrolment** on the amplifier β all without redeploying the lab. That last constraint is the whole point of feature flags: when a rollout starts misbehaving in production, you need an operational lever that does not take twenty minutes to pull. Save the file, watch the dose drop, watch the 5xx rate fall back to baseline, watch the next batch of subjects walk out seeing. |
Contributor
There was a problem hiding this comment.
I really like how you added a learning value to the story here π
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 β read the chart: wire the OpenTelemetry meter provider, register the OpenFeature
MetricsHook+TracesHook, author a customContextSpanHookthat copies the merged evaluation context onto Tempo spans, then diagnose and roll back a misbehaving fractional rollout (vision_amplifier_v2) on the Grafana LGTM dashboard β no redeploy.What this PR ships
adventures/planned/00-blind-by-design/docs/index.mdβ adds the π΄ Expert card to the Choose Your Level grid (completing the three-level set)adventures/planned/00-blind-by-design/docs/expert.mdβ challenge docadventures/planned/00-blind-by-design/docs/solutions/expert.mdβ solution walkthrough (kept in repo, hidden frommkdocs.yamlnav until challenge launch)adventures/planned/00-blind-by-design/expert/β broken-state Spring Boot project +verify.sh+dashboards/feature-flags.json(Grafana) +loadgen/k6/script.js.devcontainer/00-blind-by-design_03-expert/β adds the Grafana LGTM container and the k6 loadgen alongside flagdStacking
This is PR 3 of 3 (last in the series), branched from
adventure/blind-by-design-intermediate(#43). The diff currently includes Beginner's and Intermediate's content because this PR is stacked on top of them. Review #42 then #43 first β once they merge intomainin order, the Expert-only delta will become apparent here automatically.The original full-state PR #40 is kept open for reference and will be closed once this lands.
Closes #41