Skip to content

metrics: extend support for Tag metrics#11887

Open
edsiper wants to merge 5 commits into
masterfrom
tag-metrics
Open

metrics: extend support for Tag metrics#11887
edsiper wants to merge 5 commits into
masterfrom
tag-metrics

Conversation

@edsiper

@edsiper edsiper commented May 30, 2026

Copy link
Copy Markdown
Member

This PR adds fluentbit_input_logs_tag_records_total, a new internal metric for tracking log record counts by input name and tag. The feature is disabled by default and configured exclusively through nested YAML telemetry blocks.

It is logs-only: only the logs signal is counted (metrics, traces and profiles inputs are ignored). Because tags can be high cardinality, the feature ships with global guardrails (max_series, max_tag_length) so it cannot overwhelm the metrics endpoint.

Configuration

tag_records is always a map; enabled is the on/off toggle. Service-level
configuration controls the global default and guardrails:

service:
  telemetry:
    metrics:
      logs:
        tag_records:
          enabled: true
          max_series: 500
          max_tag_length: 128

Keys under the tag_records block:

Key Type Default Description
enabled boolean false Master switch for all inputs.
max_series integer 500 Max distinct (name, tag) series tracked globally. <= 0 disables the limit.
max_tag_length integer 128 Tags longer than this (bytes) are not tracked. <= 0 disables the limit.

Per-input override

Inputs can override only the enabled state (max_series/max_tag_length are service-global). Opt an input out while the service switch is on:

pipeline:
  inputs:
    - name: dummy
      alias: app_logs
      tag: app.logs
      dummy: '{"message":"hello"}'
      telemetry:
        metrics:
          logs:
            tag_records:
              enabled: false

Or opt a single input in while the service switch is off:

pipeline:
  inputs:
    - name: dummy
      alias: selected_input
      tag: selected.logs
      dummy: '{"message":"hello"}'
      telemetry:
        metrics:
          logs:
            tag_records:
              enabled: true

Metrics

Example Prometheus output when enabled:

fluentbit_input_logs_tag_records_total{name="app_logs",tag="app.logs"} 42
fluentbit_input_logs_tag_records_total{name="app_logs",tag="nginx.access"} 128
fluentbit_input_logs_tag_records_total{name="tail.0",tag="kube.var.log.containers.api"} 2301

Records that cannot be tracked because of the guardrails are counted separately, with the reason:

fluentbit_input_logs_tag_records_untracked_total{name="app_logs",reason="max_series"} 15
fluentbit_input_logs_tag_records_untracked_total{name="app_logs",reason="tag_length_limit"} 3

reason is one of max_series (global series cap reached), tag_length_limit (tag exceeds max_tag_length), or error (internal failure). A record that is not tracked per-tag is still counted in fluentbit_input_records_total.

name is the input alias when set, otherwise the input plugin name. tag is the Fluent Bit tag for the ingested log records.

sum by (tag) (fluentbit_input_logs_tag_records_total)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Telemetry: add metrics for tracking input log tag records with configurable enablement, max series and max tag length; supports service-level defaults and per-input overrides via nested telemetry blocks; tracks both tracked and untracked records with cardinality limits.
  • Tests

    • New integration suite validating enable/disable behavior, per-input overrides, accumulation, max-series and tag-length enforcement, multi-tag tracking, nested-block parsing, and numerous bad-config cases.

@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds telemetry "logs tag records": nested-telemetry YAML parsing, new config and per-input fields, hash-table-backed series tracking with mutex, per-input tracked/untracked cmetrics, runtime updater during log ingestion, and comprehensive integration and bad-config tests.

Changes

Tag Records Telemetry Feature

Layer / File(s) Summary
Data structures and public contracts
include/fluent-bit/flb_config.h, include/fluent-bit/flb_input.h, src/flb_config.c
Adds forward declaration for flb_hash_table; extends struct flb_config with telemetry tag-records fields (enable, max_series, max_tag_length, series_count, hash-table pointer, mutex, init flag); adds per-input override and cmetrics pointers; includes for hash table and supporting headers.
YAML parsing and telemetry schema application
src/config_format/flb_cf_yaml.c, src/flb_config.c
YAML state machine captures nested variants; new helpers validate and apply telemetry.metrics.logs.tag_records for service (boolean or map) and input (nested block limited to enabled); rejects scalar/dotted telemetry keys and unknown nested options.
Config init and resource management
src/flb_config.c
flb_config_init() sets telemetry defaults, creates tag-records hash table and mutex, performs targeted teardown on allocation failure; flb_config_exit() destroys hash table and mutex when initialized.
Per-input override and cmetrics creation
src/flb_input.c
Input instances default override to -1; effective enablement computed from input override or service; when enabled, creates two cmetrics counters (tracked by name+tag, and untracked by name+reason).
Runtime tag-records metrics collection
src/flb_input_chunk.c
Static helpers validate tag length, build a length-prefixed NUL-free key, mutex-protect hash-table lookup/insert up to max_series, and increment tracked or untracked counters; invoked during log chunk append for log events.
Integration tests and scenarios
tests/integration/scenarios/internal_metrics/tests/*, tests/integration/scenarios/internal_metrics/config/*
Adds Prometheus-metric parsing test harness and tests for enablement, defaults, limits, per-input overrides, nested-block behavior, monotonicity, multi-tag tracking; many YAML scenarios for valid and invalid configs.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant YAML_Parser
  participant Config
  participant HashTable
  participant Mutex
  CLI->>YAML_Parser: Load config file
  YAML_Parser->>Config: Deliver nested telemetry KVLIST
  Config->>Config: validate/apply telemetry.metrics.logs.tag_records
  Config->>HashTable: create telemetry tag-records HT
  Config->>Mutex: initialize telemetry mutex
  HashTable-->>Config: HT allocated
  Mutex-->>Config: mutex ready
Loading
sequenceDiagram
  participant Log_Ingest
  participant Metrics_Updater
  participant Tag_Validator
  participant HT_Lookup
  participant Cmt_Counter
  Log_Ingest->>Metrics_Updater: report (input_name, tag, count)
  Metrics_Updater->>Tag_Validator: check tag length
  Tag_Validator-->>Metrics_Updater: OK or too_long
  alt tag too long
    Metrics_Updater->>Cmt_Counter: increment untracked (reason=tag_length_limit)
  else length OK
    Metrics_Updater->>HT_Lookup: acquire mutex & lookup/insert key
    HT_Lookup-->>Metrics_Updater: found / inserted / limit_reached
    alt limit reached or insert failed
      Metrics_Updater->>Cmt_Counter: increment untracked (reason=max_series)
    else tracked
      Metrics_Updater->>Cmt_Counter: increment tracked counter (labels=name,tag)
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • cosmo0920

Poem

🐇 I hop through tags and count each trace,

A tiny hash keeps every place,
A mutex waits, a counter sings,
Prometheus hears the tiny things,
Tests tuck in the garden's grace.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: extending support for tag metrics by adding a new fluentbit_input_logs_tag_records_total metric to track log records by input and tag.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tag-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39f65fab04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/flb_config.c Outdated
Comment on lines +382 to +383
config->telemetry_metrics_tag_records_ht =
flb_hash_table_create(FLB_HASH_TABLE_EVICT_NONE, 512, 0);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a case-sensitive cardinality table for tags

When max_series is used, tags that differ only by case can bypass the limit because this new hash table is left at the default case-insensitive comparison mode while Prometheus label values remain case-sensitive. For example, after tracking tag="App" with max_series=1, tag="app" is treated as an existing key here and skips the max_series rejection path, but cmt_counter_add still creates a separate tag="app" time series. Calling flb_hash_table_set_case_sensitivity(..., FLB_TRUE) for this table would make the cardinality guard match the metric labels.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
TAG_METRICS.md (1)

97-100: ⚡ Quick win

Add language identifiers to fenced code blocks.

The fenced code blocks at lines 97-100 and 211-215 are missing language specifiers. Add bash or shell after the opening triple backticks for proper syntax highlighting.

📝 Proposed fix
-```
+```bash
 $ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus | grep tag_records
 fluentbit_input_tag_records_total{name="dummy",tag="app.logs"} 12

```diff
-```
+```bash
 cd tests/integration
 ./run_tests.py scenarios/internal_metrics      # or:
 .venv/bin/python -m pytest scenarios/internal_metrics -q
</details>

Based on learnings: static analysis tool markdownlint-cli2 flagged these blocks.


Also applies to: 211-215

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @TAG_METRICS.md around lines 97 - 100, Update the two fenced code blocks that
start with the lines "$ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus
| grep tag_records" and "cd tests/integration" to include a language specifier
(e.g., bash or shell) immediately after the opening triple backticks; replace
the opening "" with "bash" for both blocks so syntax highlighting is
enabled for the curl and test-run examples.


</details>

</blockquote></details>
<details>
<summary>src/flb_input.c (1)</summary><blockquote>

`1453-1474`: _💤 Low value_

**Counter creation condition could be refined for memory efficiency.**

The counter creation uses `ctx->telemetry_metrics_tag_records == FLB_TRUE || ins->telemetry_metrics_tag_records == FLB_TRUE`, which creates counters even when the per-input override explicitly disables the feature (`ins->telemetry_metrics_tag_records == 0`).

The runtime check in `tag_records_metrics_enabled()` (flb_input_chunk.c) correctly prioritizes the per-input override, so functionality is correct. However, for memory efficiency, the condition could mirror the runtime logic:

```c
int tag_records_enabled = (ins->telemetry_metrics_tag_records != -1)
                          ? ins->telemetry_metrics_tag_records
                          : ctx->telemetry_metrics_tag_records;
if (tag_records_enabled == FLB_TRUE) {
    // create counters
}
```

This is a minor optimization - the current code is functionally correct since unused counters are benign.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flb_input.c` around lines 1453 - 1474, The counter creation currently
checks ctx->telemetry_metrics_tag_records || ins->telemetry_metrics_tag_records
which can allocate counters even when the per-input override disables them;
change the condition to mirror tag_records_metrics_enabled() by computing an
effective flag like tag_records_enabled = (ins->telemetry_metrics_tag_records !=
-1) ? ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records
and only call cmt_counter_create for ins->cmt_tag_records and
ins->cmt_tag_records_untracked when tag_records_enabled == FLB_TRUE; keep
existing error checks (if (!ins->cmt_tag_records) return -1, etc.) and reference
the same symbols (ctx, ins, telemetry_metrics_tag_records, cmt_counter_create,
ins->cmt_tag_records_untracked) when making the change.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py:

  • Around line 28-40: The predicate passed to
    FluentBitTestService.wait_for_condition in metrics() (and likewise in any
    _run_until_value.check() usage) must catch requests exceptions so transient
    network errors don't escape the poll loop; wrap the requests.get(...) call in a
    try/except that catches requests.RequestException and returns None on error, and
    only return response.text when status_code == 200 and expected is present—this
    ensures wait_for_condition can retry instead of failing the test.

Nitpick comments:
In @src/flb_input.c:

  • Around line 1453-1474: The counter creation currently checks
    ctx->telemetry_metrics_tag_records || ins->telemetry_metrics_tag_records which
    can allocate counters even when the per-input override disables them; change the
    condition to mirror tag_records_metrics_enabled() by computing an effective flag
    like tag_records_enabled = (ins->telemetry_metrics_tag_records != -1) ?
    ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records and only
    call cmt_counter_create for ins->cmt_tag_records and
    ins->cmt_tag_records_untracked when tag_records_enabled == FLB_TRUE; keep
    existing error checks (if (!ins->cmt_tag_records) return -1, etc.) and reference
    the same symbols (ctx, ins, telemetry_metrics_tag_records, cmt_counter_create,
    ins->cmt_tag_records_untracked) when making the change.

In @TAG_METRICS.md:

  • Around line 97-100: Update the two fenced code blocks that start with the
    lines "$ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus | grep
    tag_records" and "cd tests/integration" to include a language specifier (e.g.,
    bash or shell) immediately after the opening triple backticks; replace the
    opening "" with "bash" for both blocks so syntax highlighting is enabled
    for the curl and test-run examples.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `39e803af-8aa3-4ccd-beee-802fc6645dd6`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 19cc8f9c63ab23bb98404b76407c9c81ab2b6914 and 39f65fab049a5b409352aecf2d233224e0eb97d6.

</details>

<details>
<summary>📒 Files selected for processing (16)</summary>

* `TAG_METRICS.md`
* `include/fluent-bit/flb_config.h`
* `include/fluent-bit/flb_input.h`
* `src/config_format/flb_cf_yaml.c`
* `src/flb_config.c`
* `src/flb_input.c`
* `src/flb_input_chunk.c`
* `tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_disabled.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_enabled.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml`
* `tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +28 to +40
def metrics(self, expected=None):
url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"
return self.service.wait_for_condition(
lambda: response.text
if (
(response := requests.get(url, timeout=2)).status_code == 200
and (expected is None or expected in response.text)
)
else None,
timeout=10,
interval=0.5,
description="prometheus metrics",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd test_service.py tests/integration --exec cat {}

Repository: fluent/fluent-bit

Length of output: 3027


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Find FluentBitManager implementation
fd fluent_bit_manager.py . --exec cat {}

# Search for how monitoring/HTTP endpoints are waited on
rg -n "wait_for_http_endpoint|http_monitoring_port|/api/v2/metrics/prometheus|FLUENT_BIT_TEST_LISTENER_PORT|TEST_SUITE_HTTP_PORT" tests/integration -S

# Inspect the test file for other request/poll patterns (Line ~84 etc.)
cat -n tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py | sed -n '1,220p'

# Find other uses of wait_for_condition to see if exceptions are typically handled in predicates
rg -n "wait_for_condition\(" tests/integration/scenarios/internal_metrics -S

Repository: fluent/fluent-bit

Length of output: 49737


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# show the configs used by this test file
CONFIG_DIR="tests/integration/scenarios/internal_metrics/config"
fd -e yaml "${CONFIG_DIR}" | head

# print the specific configs referenced in this test module
for f in tag_records_enabled.yaml tag_records_accumulate.yaml tag_records_disabled.yaml tag_records_multi_tag.yaml tag_records_multi_a.yaml tag_records_input_override.yaml tag_records_max_series.yaml tag_records_max_tag_length.yaml 2>/dev/null; do
  # if file exists, print a focused excerpt
  path="${CONFIG_DIR}/${f}"
  if [ -f "$path" ]; then
    echo "===== $path ====="
    wc -l "$path"
    cat -n "$path" | sed -n '1,220p'
  fi
done

# inspect related HTTP/metrics validation to see if any probed metrics have empty label sets
cat -n tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_001.py | sed -n '1,220p'

Repository: fluent/fluent-bit

Length of output: 571


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List relevant YAML configs under internal_metrics
fd -e yaml --full-path "tests/integration/scenarios/internal_metrics" -a | sort | sed -n '1,200p'

# Show configs referenced by this test module (if they exist)
BASE="tests/integration/scenarios/internal_metrics"
for f in tag_records_enabled.yaml tag_records_accumulate.yaml tag_records_disabled.yaml tag_records_multi_tag.yaml tag_records_max_series.yaml tag_records_max_tag_length.yaml tag_records_input_override.yaml tag_records_multi_tag.yaml; do
  if [ -f "$BASE/tests/config/$f" ]; then
    echo "===== $BASE/tests/config/$f ====="
    wc -l "$BASE/tests/config/$f"
    cat -n "$BASE/tests/config/$f" | sed -n '1,220p'
  elif [ -f "$BASE/config/$f" ]; then
    echo "===== $BASE/config/$f ====="
    wc -l "$BASE/config/$f"
    cat -n "$BASE/config/$f" | sed -n '1,220p'
  else
    # try any matching filename under internal_metrics
    hit="$(fd --full-path -e yaml "$f" "$BASE" | head -n 1 || true)"
    if [ -n "${hit:-}" ]; then
      echo "===== $hit ====="
      wc -l "$hit"
      cat -n "$hit" | sed -n '1,220p'
    fi
  fi
done

# Inspect internal_http_server metric parsing to see if empty-label cases are handled
cat -n tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_001.py | sed -n '1,260p'

Repository: fluent/fluent-bit

Length of output: 13627


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# search for wait_for_condition predicates that call requests.get directly without guard
rg -n "wait_for_condition\([\s\S]{0,200}requests\.get" tests/integration/scenarios -S

# broader search: any requests.get in integration predicates
rg -n "wait_for_condition\(" tests/integration/scenarios -S | head -n 200

# search specifically for 'requests.get(url' usage
rg -n "requests\.get\(url" tests/integration/scenarios -S

# show the FluentBitManager.wait_for_fluent_bit exception handling section
sed -n '160,260p' tests/integration/src/utils/fluent_bit_manager.py

Repository: fluent/fluent-bit

Length of output: 43


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect dummy input implementation/docs to judge whether samples=1 is deterministic
rg -n "dummy input|name: dummy|input_dummy|dummy.*samples|samples:" -S tests/integration src | head -n 200

# Inspect where tag_records metrics are defined to see if the metrics always have labels (no empty label sets)
rg -n "tag_records(_|\\b)|fluentbit_input_tag_records|tag.records" -S src tests | head -n 200

Repository: fluent/fluent-bit

Length of output: 28623


Guard requests.get inside wait_for_condition predicates to avoid uncaught RequestException during polling

  • FluentBitTestService.wait_for_condition evaluates the predicate without catching exceptions, so any requests failure during polling will abort the test instead of retrying.
  • FluentBitManager.start() waits for the HTTP server via /api/v1/uptime (uptime_sec > 1), but the /api/v2/metrics/prometheus requests in metrics() (and _run_until_value.check()) remain unguarded.
🛡️ Suggested guard (apply to both poll sites)
     def metrics(self, expected=None):
         url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"
+        def _poll():
+            try:
+                response = requests.get(url, timeout=2)
+            except requests.exceptions.RequestException:
+                return None
+            if response.status_code == 200 and (expected is None or expected in response.text):
+                return response.text
+            return None
         return self.service.wait_for_condition(
-            lambda: response.text
-            if (
-                (response := requests.get(url, timeout=2)).status_code == 200
-                and (expected is None or expected in response.text)
-            )
-            else None,
+            _poll,
             timeout=10,
             interval=0.5,
             description="prometheus metrics",
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def metrics(self, expected=None):
url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"
return self.service.wait_for_condition(
lambda: response.text
if (
(response := requests.get(url, timeout=2)).status_code == 200
and (expected is None or expected in response.text)
)
else None,
timeout=10,
interval=0.5,
description="prometheus metrics",
)
def metrics(self, expected=None):
url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"
def _poll():
try:
response = requests.get(url, timeout=2)
except requests.exceptions.RequestException:
return None
if response.status_code == 200 and (expected is None or expected in response.text):
return response.text
return None
return self.service.wait_for_condition(
_poll,
timeout=10,
interval=0.5,
description="prometheus metrics",
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py`
around lines 28 - 40, The predicate passed to
FluentBitTestService.wait_for_condition in metrics() (and likewise in any
_run_until_value.check() usage) must catch requests exceptions so transient
network errors don't escape the poll loop; wrap the requests.get(...) call in a
try/except that catches requests.RequestException and returns None on error, and
only return response.text when status_code == 200 and expected is present—this
ensures wait_for_condition can retry instead of failing the test.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/flb_config.c (2)

880-919: ⚡ Quick win

Move variable declarations to the start of the function.

Lines 885-886 declare ret and tmp after the function opening brace but before the switch statement at line 887. As per coding guidelines, variables should be declared at the start of functions, not mid-block.

♻️ Proposed refactor
 static int variant_as_bool(struct flb_config *config,
                            struct cfl_variant *v, int *out)
 {
+    int ret;
+    flb_sds_t tmp;
+
     switch (v->type) {
     case CFL_VARIANT_BOOL:
         *out = v->data.as_bool ? FLB_TRUE : FLB_FALSE;
         return 0;
     case CFL_VARIANT_INT:
         /* only 0/1 are valid numeric booleans (reject e.g. 2 or -1) */
         if (v->data.as_int64 != 0 && v->data.as_int64 != 1) {
             return -1;
         }
         *out = v->data.as_int64 ? FLB_TRUE : FLB_FALSE;
         return 0;
     case CFL_VARIANT_UINT:
         if (v->data.as_uint64 != 0 && v->data.as_uint64 != 1) {
             return -1;
         }
         *out = v->data.as_uint64 ? FLB_TRUE : FLB_FALSE;
         return 0;
     case CFL_VARIANT_STRING:
-        int ret;
-        flb_sds_t tmp;
         tmp = flb_env_var_translate(config->env, v->data.as_string);

As per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flb_config.c` around lines 880 - 919, In function variant_as_bool move
all local variable declarations to the top of the function body: declare int
ret; and flb_sds_t tmp; immediately after the opening brace before any
statements (i.e. before the switch), so the variables are defined at function
start while preserving their types and existing usage in the CFL_VARIANT_STRING
case and return paths; no other logic changes.

921-966: ⚡ Quick win

Move variable declarations to the start of the function.

Lines 925-927 declare tmp, end, and val after the function opening brace but before the switch statement at line 929. As per coding guidelines, variables should be declared at the start of functions, not mid-block.

♻️ Proposed refactor
 static int variant_as_int(struct flb_config *config,
                           struct cfl_variant *v, int *out)
 {
+    flb_sds_t tmp;
+    char *end;
+    long val;
+
     switch (v->type) {
     case CFL_VARIANT_INT:
         if (v->data.as_int64 < INT_MIN || v->data.as_int64 > INT_MAX) {
             return -1;
         }
         *out = (int) v->data.as_int64;
         return 0;
     case CFL_VARIANT_UINT:
         if (v->data.as_uint64 > (uint64_t) INT_MAX) {
             return -1;
         }
         *out = (int) v->data.as_uint64;
         return 0;
     case CFL_VARIANT_STRING:
-        flb_sds_t tmp;
-        char *end;
-        long val;
         tmp = flb_env_var_translate(config->env, v->data.as_string);

As per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flb_config.c` around lines 921 - 966, The variables tmp, end, and val in
function variant_as_int are declared after the opening brace but before the
switch; move their declarations to the top of the function (immediately after
the opening brace and before any logic) so flb_sds_t tmp; char *end; and long
val; are declared up front; keep the rest of the logic and types unchanged and
ensure any uses of tmp, end, and val remain identical.
tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py (1)

67-97: 💤 Low value

Consider iterable unpacking for command construction.

Line 75-80 uses list concatenation to prepend valgrind arguments. While this works, iterable unpacking is slightly more Pythonic:

♻️ Optional refactor
     if VALGRIND:
         vg_log = tempfile.NamedTemporaryFile(
             prefix="vg_badcfg_", suffix=".log", delete=False
         ).name
-        cmd = [
+        cmd = [
             "valgrind",
             f"--log-file={vg_log}",
             "--leak-check=full",
             "--show-leak-kinds=all",
-        ] + cmd
+            *cmd
+        ]
         timeout *= 6  # valgrind is much slower
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py`
around lines 67 - 97, The _run function builds the subprocess command using list
concatenation to prepend Valgrind args; replace that concatenation with iterable
unpacking to make the construction more Pythonic: when VALGRIND is true, create
vg_log and then set cmd = ["valgrind", f"--log-file={vg_log}",
"--leak-check=full", "--show-leak-kinds=all", *cmd] (preserve the existing
timeout *= 6 behavior and the variables cmd, VALGRIND, vg_log and the rest of
the function).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/flb_config.c`:
- Around line 375-399: The mutex config->telemetry_metrics_logs_tag_records_lock
must be initialized before any early-return path that could bypass its init
(e.g., failure of flb_hash_table_create); move the pthread_mutex_init call to
just before the flb_hash_table_create invocation (or add a boolean like
telemetry_metrics_logs_tag_records_lock_inited that you set after
pthread_mutex_init and check in flb_config_exit before calling
pthread_mutex_destroy) so flb_config_exit never calls pthread_mutex_destroy on
an uninitialized mutex.

---

Nitpick comments:
In `@src/flb_config.c`:
- Around line 880-919: In function variant_as_bool move all local variable
declarations to the top of the function body: declare int ret; and flb_sds_t
tmp; immediately after the opening brace before any statements (i.e. before the
switch), so the variables are defined at function start while preserving their
types and existing usage in the CFL_VARIANT_STRING case and return paths; no
other logic changes.
- Around line 921-966: The variables tmp, end, and val in function
variant_as_int are declared after the opening brace but before the switch; move
their declarations to the top of the function (immediately after the opening
brace and before any logic) so flb_sds_t tmp; char *end; and long val; are
declared up front; keep the rest of the logic and types unchanged and ensure any
uses of tmp, end, and val remain identical.

In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py`:
- Around line 67-97: The _run function builds the subprocess command using list
concatenation to prepend Valgrind args; replace that concatenation with iterable
unpacking to make the construction more Pythonic: when VALGRIND is true, create
vg_log and then set cmd = ["valgrind", f"--log-file={vg_log}",
"--leak-check=full", "--show-leak-kinds=all", *cmd] (preserve the existing
timeout *= 6 behavior and the variables cmd, VALGRIND, vg_log and the rest of
the function).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ba26463-dbc1-4625-9ed3-d177804c248c

📥 Commits

Reviewing files that changed from the base of the PR and between 88d7334 and 731ec88.

📒 Files selected for processing (32)
  • include/fluent-bit/flb_config.h
  • include/fluent-bit/flb_input.h
  • src/config_format/flb_cf_yaml.c
  • src/flb_config.c
  • src/flb_input.c
  • src/flb_input_chunk.c
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_dotted_key.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_dotted_service.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_enabled_value.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_service_only.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_value.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_malformed_metrics.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_max_series_value.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_nested_unknown.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_numeric_bool.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_telemetry_scalar.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_type.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_option.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_parent_key.yaml
  • tests/integration/scenarios/internal_metrics/config/lenient_tag_records_empty_block.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_disabled.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_enabled.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_disable.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_enable.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml
  • tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py
  • tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py
✅ Files skipped from review due to trivial changes (9)
  • tests/integration/scenarios/internal_metrics/config/lenient_tag_records_empty_block.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_malformed_metrics.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_telemetry_scalar.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_service_only.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_disable.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_nested_unknown.yaml
  • tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_option.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml
  • tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml
  • src/flb_input_chunk.c
  • src/config_format/flb_cf_yaml.c
  • tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py

Comment thread src/flb_config.c
edsiper added 5 commits May 31, 2026 10:20
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant