Diagnose log injection smoke test flakiness instead of masking it#11075
Draft
Diagnose log injection smoke test flakiness instead of masking it#11075
Conversation
The `check raw file injection` test has been flaking across 11+ logging backend variants for months. CI Visibility data shows 90% of failures are `traceCount=0` at `waitForTraceCount(2)` after exactly 30s — the JVM + agent bytecode instrumentation simply takes >30s on overloaded CI machines. Changes: - Add `startupPoll` with 120s timeout for the initial `waitForTraceCount(2)` that covers JVM startup + agent init, giving 4x headroom over the current 30s `defaultPoll` - Add `waitForTraceCountAlive` that checks process liveness on each poll iteration, turning silent 30-120s timeouts into instant, actionable errors when the process crashes - Reorder `waitForTraceCount(4)` before `waitFor` to confirm all traces are delivered while the process is still alive - Assert `waitFor` return value for a clear error if the process hangs tag: no release note Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.068 s) : 0, 1068067
Total [baseline] (8.849 s) : 0, 8849434
Agent [candidate] (1.06 s) : 0, 1059507
Total [candidate] (8.835 s) : 0, 8834611
section iast
Agent [baseline] (1.224 s) : 0, 1224300
Total [baseline] (9.572 s) : 0, 9572168
Agent [candidate] (1.226 s) : 0, 1225816
Total [candidate] (9.582 s) : 0, 9581846
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.251 ms) : 0, 1251
crashtracking [candidate] (1.224 ms) : 0, 1224
BytebuddyAgent [baseline] (638.326 ms) : 0, 638326
BytebuddyAgent [candidate] (633.791 ms) : 0, 633791
AgentMeter [baseline] (29.743 ms) : 0, 29743
AgentMeter [candidate] (29.517 ms) : 0, 29517
GlobalTracer [baseline] (251.297 ms) : 0, 251297
GlobalTracer [candidate] (249.054 ms) : 0, 249054
AppSec [baseline] (32.346 ms) : 0, 32346
AppSec [candidate] (31.921 ms) : 0, 31921
Debugger [baseline] (60.086 ms) : 0, 60086
Debugger [candidate] (59.322 ms) : 0, 59322
Remote Config [baseline] (615.472 µs) : 0, 615
Remote Config [candidate] (594.755 µs) : 0, 595
Telemetry [baseline] (8.223 ms) : 0, 8223
Telemetry [candidate] (8.115 ms) : 0, 8115
Flare Poller [baseline] (9.78 ms) : 0, 9780
Flare Poller [candidate] (9.731 ms) : 0, 9731
section iast
crashtracking [baseline] (1.236 ms) : 0, 1236
crashtracking [candidate] (1.23 ms) : 0, 1230
BytebuddyAgent [baseline] (800.972 ms) : 0, 800972
BytebuddyAgent [candidate] (802.871 ms) : 0, 802871
AgentMeter [baseline] (11.395 ms) : 0, 11395
AgentMeter [candidate] (11.478 ms) : 0, 11478
GlobalTracer [baseline] (239.817 ms) : 0, 239817
GlobalTracer [candidate] (240.472 ms) : 0, 240472
AppSec [baseline] (31.14 ms) : 0, 31140
AppSec [candidate] (33.427 ms) : 0, 33427
Debugger [baseline] (60.062 ms) : 0, 60062
Debugger [candidate] (56.782 ms) : 0, 56782
Remote Config [baseline] (539.582 µs) : 0, 540
Remote Config [candidate] (517.713 µs) : 0, 518
Telemetry [baseline] (13.207 ms) : 0, 13207
Telemetry [candidate] (13.151 ms) : 0, 13151
Flare Poller [baseline] (3.612 ms) : 0, 3612
Flare Poller [candidate] (3.403 ms) : 0, 3403
IAST [baseline] (25.906 ms) : 0, 25906
IAST [candidate] (25.928 ms) : 0, 25928
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1060135
Total [baseline] (11.099 s) : 0, 11098805
Agent [candidate] (1.074 s) : 0, 1073912
Total [candidate] (11.183 s) : 0, 11183254
section appsec
Agent [baseline] (1.25 s) : 0, 1250296
Total [baseline] (11.213 s) : 0, 11212607
Agent [candidate] (1.255 s) : 0, 1255449
Total [candidate] (11.269 s) : 0, 11268944
section iast
Agent [baseline] (1.225 s) : 0, 1224831
Total [baseline] (11.304 s) : 0, 11303781
Agent [candidate] (1.224 s) : 0, 1223713
Total [candidate] (11.347 s) : 0, 11347404
section profiling
Agent [baseline] (1.186 s) : 0, 1185957
Total [baseline] (11.021 s) : 0, 11020816
Agent [candidate] (1.184 s) : 0, 1184064
Total [candidate] (11.091 s) : 0, 11090510
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.249 ms) : 0, 1249
crashtracking [candidate] (1.258 ms) : 0, 1258
BytebuddyAgent [baseline] (633.195 ms) : 0, 633195
BytebuddyAgent [candidate] (641.6 ms) : 0, 641600
AgentMeter [baseline] (29.293 ms) : 0, 29293
AgentMeter [candidate] (29.873 ms) : 0, 29873
GlobalTracer [baseline] (249.556 ms) : 0, 249556
GlobalTracer [candidate] (252.545 ms) : 0, 252545
AppSec [baseline] (32.047 ms) : 0, 32047
AppSec [candidate] (32.56 ms) : 0, 32560
Debugger [baseline] (59.996 ms) : 0, 59996
Debugger [candidate] (60.974 ms) : 0, 60974
Remote Config [baseline] (601.662 µs) : 0, 602
Remote Config [candidate] (606.693 µs) : 0, 607
Telemetry [baseline] (8.12 ms) : 0, 8120
Telemetry [candidate] (8.226 ms) : 0, 8226
Flare Poller [baseline] (9.842 ms) : 0, 9842
Flare Poller [candidate] (9.751 ms) : 0, 9751
section appsec
crashtracking [baseline] (1.23 ms) : 0, 1230
crashtracking [candidate] (1.233 ms) : 0, 1233
BytebuddyAgent [baseline] (661.306 ms) : 0, 661306
BytebuddyAgent [candidate] (666.516 ms) : 0, 666516
AgentMeter [baseline] (12.116 ms) : 0, 12116
AgentMeter [candidate] (12.135 ms) : 0, 12135
GlobalTracer [baseline] (250.33 ms) : 0, 250330
GlobalTracer [candidate] (250.0 ms) : 0, 250000
AppSec [baseline] (185.036 ms) : 0, 185036
AppSec [candidate] (184.951 ms) : 0, 184951
Debugger [baseline] (66.366 ms) : 0, 66366
Debugger [candidate] (66.622 ms) : 0, 66622
Remote Config [baseline] (619.492 µs) : 0, 619
Remote Config [candidate] (606.051 µs) : 0, 606
Telemetry [baseline] (8.678 ms) : 0, 8678
Telemetry [candidate] (8.651 ms) : 0, 8651
Flare Poller [baseline] (3.515 ms) : 0, 3515
Flare Poller [candidate] (3.586 ms) : 0, 3586
IAST [baseline] (24.675 ms) : 0, 24675
IAST [candidate] (24.683 ms) : 0, 24683
section iast
crashtracking [baseline] (1.235 ms) : 0, 1235
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (801.776 ms) : 0, 801776
BytebuddyAgent [candidate] (801.43 ms) : 0, 801430
AgentMeter [baseline] (11.415 ms) : 0, 11415
AgentMeter [candidate] (11.39 ms) : 0, 11390
GlobalTracer [baseline] (239.354 ms) : 0, 239354
GlobalTracer [candidate] (239.093 ms) : 0, 239093
AppSec [baseline] (31.846 ms) : 0, 31846
AppSec [candidate] (33.159 ms) : 0, 33159
Debugger [baseline] (59.442 ms) : 0, 59442
Debugger [candidate] (57.981 ms) : 0, 57981
Remote Config [baseline] (540.82 µs) : 0, 541
Remote Config [candidate] (530.053 µs) : 0, 530
Telemetry [baseline] (12.909 ms) : 0, 12909
Telemetry [candidate] (13.349 ms) : 0, 13349
Flare Poller [baseline] (3.426 ms) : 0, 3426
Flare Poller [candidate] (3.429 ms) : 0, 3429
IAST [baseline] (26.564 ms) : 0, 26564
IAST [candidate] (25.817 ms) : 0, 25817
section profiling
crashtracking [baseline] (1.181 ms) : 0, 1181
crashtracking [candidate] (1.164 ms) : 0, 1164
BytebuddyAgent [baseline] (693.053 ms) : 0, 693053
BytebuddyAgent [candidate] (691.543 ms) : 0, 691543
AgentMeter [baseline] (9.163 ms) : 0, 9163
AgentMeter [candidate] (9.131 ms) : 0, 9131
GlobalTracer [baseline] (206.862 ms) : 0, 206862
GlobalTracer [candidate] (206.912 ms) : 0, 206912
AppSec [baseline] (32.47 ms) : 0, 32470
AppSec [candidate] (32.576 ms) : 0, 32576
Debugger [baseline] (65.757 ms) : 0, 65757
Debugger [candidate] (65.603 ms) : 0, 65603
Remote Config [baseline] (587.963 µs) : 0, 588
Remote Config [candidate] (572.814 µs) : 0, 573
Telemetry [baseline] (7.871 ms) : 0, 7871
Telemetry [candidate] (7.863 ms) : 0, 7863
Flare Poller [baseline] (3.559 ms) : 0, 3559
Flare Poller [candidate] (3.57 ms) : 0, 3570
ProfilingAgent [baseline] (94.125 ms) : 0, 94125
ProfilingAgent [candidate] (93.889 ms) : 0, 93889
Profiling [baseline] (94.698 ms) : 0, 94698
Profiling [candidate] (94.466 ms) : 0, 94466
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 2 performance regressions! Performance is the same for 16 metrics, 15 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section baseline
no_agent (19.2 ms) : 19005, 19394
. : milestone, 19200,
appsec (19.006 ms) : 18815, 19197
. : milestone, 19006,
code_origins (18.062 ms) : 17883, 18240
. : milestone, 18062,
iast (17.73 ms) : 17556, 17904
. : milestone, 17730,
profiling (19.284 ms) : 19086, 19482
. : milestone, 19284,
tracing (17.86 ms) : 17682, 18037
. : milestone, 17860,
section candidate
no_agent (18.081 ms) : 17896, 18265
. : milestone, 18081,
appsec (19.911 ms) : 19706, 20116
. : milestone, 19911,
code_origins (18.134 ms) : 17955, 18312
. : milestone, 18134,
iast (17.644 ms) : 17472, 17816
. : milestone, 17644,
profiling (18.324 ms) : 18139, 18509
. : milestone, 18324,
tracing (17.918 ms) : 17740, 18096
. : milestone, 17918,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section baseline
no_agent (1.243 ms) : 1231, 1255
. : milestone, 1243,
iast (3.337 ms) : 3288, 3386
. : milestone, 3337,
iast_FULL (6.056 ms) : 5995, 6117
. : milestone, 6056,
iast_GLOBAL (3.74 ms) : 3679, 3801
. : milestone, 3740,
profiling (2.279 ms) : 2258, 2301
. : milestone, 2279,
tracing (1.881 ms) : 1865, 1897
. : milestone, 1881,
section candidate
no_agent (1.252 ms) : 1240, 1264
. : milestone, 1252,
iast (3.306 ms) : 3257, 3354
. : milestone, 3306,
iast_FULL (5.891 ms) : 5832, 5950
. : milestone, 5891,
iast_GLOBAL (3.677 ms) : 3615, 3739
. : milestone, 3677,
profiling (2.367 ms) : 2344, 2390
. : milestone, 2367,
tracing (1.867 ms) : 1852, 1882
. : milestone, 1867,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section baseline
no_agent (15.674 s) : 15674000, 15674000
. : milestone, 15674000,
appsec (15.087 s) : 15087000, 15087000
. : milestone, 15087000,
iast (18.249 s) : 18249000, 18249000
. : milestone, 18249000,
iast_GLOBAL (18.156 s) : 18156000, 18156000
. : milestone, 18156000,
profiling (14.801 s) : 14801000, 14801000
. : milestone, 14801000,
tracing (14.971 s) : 14971000, 14971000
. : milestone, 14971000,
section candidate
no_agent (14.951 s) : 14951000, 14951000
. : milestone, 14951000,
appsec (14.747 s) : 14747000, 14747000
. : milestone, 14747000,
iast (18.143 s) : 18143000, 18143000
. : milestone, 18143000,
iast_GLOBAL (18.111 s) : 18111000, 18111000
. : milestone, 18111000,
profiling (14.864 s) : 14864000, 14864000
. : milestone, 14864000,
tracing (14.882 s) : 14882000, 14882000
. : milestone, 14882000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~9a554fcbf3, baseline=1.62.0-SNAPSHOT~b266e2d0c2
dateFormat X
axisFormat %s
section baseline
no_agent (1.492 ms) : 1481, 1504
. : milestone, 1492,
appsec (3.87 ms) : 3647, 4094
. : milestone, 3870,
iast (2.272 ms) : 2203, 2341
. : milestone, 2272,
iast_GLOBAL (2.317 ms) : 2247, 2386
. : milestone, 2317,
profiling (2.095 ms) : 2041, 2150
. : milestone, 2095,
tracing (2.091 ms) : 2038, 2145
. : milestone, 2091,
section candidate
no_agent (1.488 ms) : 1476, 1499
. : milestone, 1488,
appsec (3.818 ms) : 3596, 4040
. : milestone, 3818,
iast (2.267 ms) : 2198, 2336
. : milestone, 2267,
iast_GLOBAL (2.314 ms) : 2245, 2383
. : milestone, 2314,
profiling (2.122 ms) : 2066, 2177
. : milestone, 2122,
tracing (2.088 ms) : 2034, 2142
. : milestone, 2088,
|
The `check raw file injection` test flakes across 11+ logging backend variants. CI Visibility data shows the failure is bimodal — successful runs complete in 3-9s, but failures sit at exactly 30s (the PollingConditions timeout) with traceCount=0. Nothing in between. This means the process either works or is totally broken — no amount of timeout increase will help. The current test is blind during the 30s wait — it just polls traceCount with no diagnostics when the process crashes or hangs. Changes: - Add `waitForTraceCountAlive` that checks process liveness on every poll iteration. If the process dies, it fails immediately with the exit code, RC poll count, and last 20 lines of process output. - On timeout, enrich the error with diagnostic state (process alive?, traceCount, RC polls received, last 30 lines of output) so the next CI failure tells us whether it's a crash, a hang, or a connectivity issue. - Reorder `waitForTraceCount(4)` before `waitFor` to confirm all traces are delivered while the process is still alive. - Assert `waitFor` return value for a clear error if the process hangs. tag: no release notes Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Adds diagnostic instrumentation to the
check raw file injectionsmoke test so the next CI failure tells us the root cause instead of a bare "Condition not satisfied after 30s" withtraceCount=0.Changes to
LogInjectionSmokeTest:waitForTraceCountAlive— checks process liveness on every poll iteration; if the process dies, fails immediately with exit code + last 20 lines of process outputwaitForTraceCount(4)beforewaitFor+ assertwaitForreturn valueMotivation
CI Visibility data for the last 30 days on master shows 10 failures of
check raw file injection:traceCount=0atwaitForTraceCount(2)logLines.size()=3atassertRawLogLinesWithInjectionThe failure distribution is bimodal — successful runs complete in 3.5-8.7s (80 data points, zero above 9s), while failures sit at exactly 30.3s. There is nothing in between. This means the process either works or is totally broken — a timeout increase would just delay the same failure.
The current test is blind during the wait — it just polls
traceCountin a loop. We don't know if the process crashed, hung during agent init, failed to connect to the test server, or something else entirely. This PR makes the next failure self-diagnosing.Example output when process crashes:
Example output on timeout (process alive but not sending traces):
Additional Notes
LogInjectionSmokeTest.groovyis changeddefaultPollis kept as-isrcClientMessages.size()tells us whether the agent connected to the test server at all (RC polls hit/v0.7/configevery 200ms)Contributor Checklist
type:and (comp:orinst:) labelsclose,fix, or any linking keywords when referencing an issuetag: no release notes
tag: ai generated
🤖 Generated with Claude Code