Skip to content

otelcol: drop unparseable container logs quietly + filter nil container.name#51

Merged
samcm merged 1 commit into
masterfrom
samcm/otelcol-drop-quiet
May 19, 2026
Merged

otelcol: drop unparseable container logs quietly + filter nil container.name#51
samcm merged 1 commit into
masterfrom
samcm/otelcol-drop-quiet

Conversation

@samcm
Copy link
Copy Markdown
Member

@samcm samcm commented May 19, 2026

Filelog container operator hits add_metadata_from_filepath failures during container churn and logs them at error level; otelcol tails its own stderr and re-ships those errors, amplifying volume ~10x per failure (stack traces). Adds on_error: drop_quiet to the container operator and flips the downstream filter to drop entries with nil container.name as a safety net. Mirrors ethpandaops/blob-devnets#15.

…er.name

Filelog 'container' operator hits add_metadata_from_filepath failures during container churn and logs them at error level. otelcol then tails its own stderr via filelog and re-ships those errors, ~10x amplification per failure (stack traces). on_error: drop_quiet silences and drops the failed entry; updated filter expr drops entries with nil container.name as a safety net. Mirrors blob-devnets#15.
@qu0b-reviewer
Copy link
Copy Markdown

qu0b-reviewer Bot commented May 19, 2026

🤖 qu0b-reviewer

Summary

Two devnet OTLP config files get three line-level changes: adding on_error: drop_quiet to the container operator, and flipping the filter's expression from positive allowlist (!= nil AND matches X) to negative allowlist (== nil OR matches X). The filter change inverts the pass-through semantics — logs now pass through if they're not from application containers.

Issues

  • 🟡 devnet-3/group_vars/all/all.yaml:340 and devnet-6/group_vars/all/all.yaml:374Filter expression flips intent silently. The original says "pass only named containers in the allowlist"; the new one says "pass nil-name containers and the allowlist." Both have the same net effect for application containers (they pass), but the logic is harder to reason about and diverges from the standard negative-allow pattern. The original check != nil was a guard protecting downstream operators from crashing on nil; moving it to == nil or ... still guards (the nil case exits early), but makes the allowlist semantics read backwards. Not a blocker since the function is identical, but the expression is semantically inverted with no comment explaining why.

Suggestions

  • The filter could be written more readably using a not to keep the allowlist positive and flip only once:
    expr: 'not (attributes["container.name"] != nil and not attributes["container.name"] matches "...苏宁")'
    
    Or simply add a comment: # Pass infra containers + any log without a container.name (unparseable/edge-case logs).
  • Consider extracting the filter expressions into an Ansible variable with a comment, since both devnets now share identical logic — DRY if you ever add devnet-7.

Reviewed @ e9a27c63
"Async by default."

@samcm samcm merged commit f6ddb99 into master May 19, 2026
1 check passed
@samcm samcm deleted the samcm/otelcol-drop-quiet branch May 19, 2026 06:23
samcm added a commit that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant