metrics: extend support for Tag metrics by edsiper · Pull Request #11887 · fluent/fluent-bit

edsiper · 2026-05-30T17:30:01Z

This PR adds fluentbit_input_logs_tag_records_total, a new internal metric for tracking log record counts by input name and tag. The feature is disabled by default and configured exclusively through nested YAML telemetry blocks.

It is logs-only: only the logs signal is counted (metrics, traces and profiles inputs are ignored). Because tags can be high cardinality, the feature ships with global guardrails (max_series, max_tag_length) so it cannot overwhelm the metrics endpoint.

Configuration

tag_records is always a map; enabled is the on/off toggle. Service-level
configuration controls the global default and guardrails:

service:
  telemetry:
    metrics:
      logs:
        tag_records:
          enabled: true
          max_series: 500
          max_tag_length: 128

Keys under the tag_records block:

Key	Type	Default	Description
`enabled`	boolean	`false`	Master switch for all inputs.
`max_series`	integer	`500`	Max distinct `(name, tag)` series tracked globally. `<= 0` disables the limit.
`max_tag_length`	integer	`128`	Tags longer than this (bytes) are not tracked. `<= 0` disables the limit.

Per-input override

Inputs can override only the enabled state (max_series/max_tag_length are service-global). Opt an input out while the service switch is on:

pipeline:
  inputs:
    - name: dummy
      alias: app_logs
      tag: app.logs
      dummy: '{"message":"hello"}'
      telemetry:
        metrics:
          logs:
            tag_records:
              enabled: false

Or opt a single input in while the service switch is off:

pipeline:
  inputs:
    - name: dummy
      alias: selected_input
      tag: selected.logs
      dummy: '{"message":"hello"}'
      telemetry:
        metrics:
          logs:
            tag_records:
              enabled: true

Metrics

Example Prometheus output when enabled:

fluentbit_input_logs_tag_records_total{name="app_logs",tag="app.logs"} 42
fluentbit_input_logs_tag_records_total{name="app_logs",tag="nginx.access"} 128
fluentbit_input_logs_tag_records_total{name="tail.0",tag="kube.var.log.containers.api"} 2301

Records that cannot be tracked because of the guardrails are counted separately, with the reason:

fluentbit_input_logs_tag_records_untracked_total{name="app_logs",reason="max_series"} 15
fluentbit_input_logs_tag_records_untracked_total{name="app_logs",reason="tag_length_limit"} 3

reason is one of max_series (global series cap reached), tag_length_limit (tag exceeds max_tag_length), or error (internal failure). A record that is not tracked per-tag is still counted in fluentbit_input_records_total.

name is the input alias when set, otherwise the input plugin name. tag is the Fluent Bit tag for the ingested log records.

sum by (tag) (fluentbit_input_logs_tag_records_total)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

New Features
- Telemetry: add metrics for tracking input log tag records with configurable enablement, max series and max tag length; supports service-level defaults and per-input overrides via nested telemetry blocks; tracks both tracked and untracked records with cardinality limits.
Tests
- New integration suite validating enable/disable behavior, per-input overrides, accumulation, max-series and tag-length enforcement, multi-tag tracking, nested-block parsing, and numerous bad-config cases.

coderabbitai · 2026-05-30T17:30:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds telemetry "logs tag records": nested-telemetry YAML parsing, new config and per-input fields, hash-table-backed series tracking with mutex, per-input tracked/untracked cmetrics, runtime updater during log ingestion, and comprehensive integration and bad-config tests.

Changes

Tag Records Telemetry Feature

Layer / File(s)	Summary
Data structures and public contracts `include/fluent-bit/flb_config.h`, `include/fluent-bit/flb_input.h`, `src/flb_config.c`	Adds forward declaration for `flb_hash_table`; extends `struct flb_config` with telemetry tag-records fields (enable, max_series, max_tag_length, series_count, hash-table pointer, mutex, init flag); adds per-input override and cmetrics pointers; includes for hash table and supporting headers.
YAML parsing and telemetry schema application `src/config_format/flb_cf_yaml.c`, `src/flb_config.c`	YAML state machine captures nested variants; new helpers validate and apply `telemetry.metrics.logs.tag_records` for service (boolean or map) and input (nested block limited to enabled); rejects scalar/dotted telemetry keys and unknown nested options.
Config init and resource management `src/flb_config.c`	`flb_config_init()` sets telemetry defaults, creates tag-records hash table and mutex, performs targeted teardown on allocation failure; `flb_config_exit()` destroys hash table and mutex when initialized.
Per-input override and cmetrics creation `src/flb_input.c`	Input instances default override to -1; effective enablement computed from input override or service; when enabled, creates two cmetrics counters (tracked by name+tag, and untracked by name+reason).
Runtime tag-records metrics collection `src/flb_input_chunk.c`	Static helpers validate tag length, build a length-prefixed NUL-free key, mutex-protect hash-table lookup/insert up to max_series, and increment tracked or untracked counters; invoked during log chunk append for log events.
Integration tests and scenarios `tests/integration/scenarios/internal_metrics/tests/`, `tests/integration/scenarios/internal_metrics/config/`	Adds Prometheus-metric parsing test harness and tests for enablement, defaults, limits, per-input overrides, nested-block behavior, monotonicity, multi-tag tracking; many YAML scenarios for valid and invalid configs.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant YAML_Parser
  participant Config
  participant HashTable
  participant Mutex
  CLI->>YAML_Parser: Load config file
  YAML_Parser->>Config: Deliver nested telemetry KVLIST
  Config->>Config: validate/apply telemetry.metrics.logs.tag_records
  Config->>HashTable: create telemetry tag-records HT
  Config->>Mutex: initialize telemetry mutex
  HashTable-->>Config: HT allocated
  Mutex-->>Config: mutex ready

sequenceDiagram
  participant Log_Ingest
  participant Metrics_Updater
  participant Tag_Validator
  participant HT_Lookup
  participant Cmt_Counter
  Log_Ingest->>Metrics_Updater: report (input_name, tag, count)
  Metrics_Updater->>Tag_Validator: check tag length
  Tag_Validator-->>Metrics_Updater: OK or too_long
  alt tag too long
    Metrics_Updater->>Cmt_Counter: increment untracked (reason=tag_length_limit)
  else length OK
    Metrics_Updater->>HT_Lookup: acquire mutex & lookup/insert key
    HT_Lookup-->>Metrics_Updater: found / inserted / limit_reached
    alt limit reached or insert failed
      Metrics_Updater->>Cmt_Counter: increment untracked (reason=max_series)
    else tracked
      Metrics_Updater->>Cmt_Counter: increment tracked counter (labels=name,tag)
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

cosmo0920

Poem

🐇 I hop through tags and count each trace,

A tiny hash keeps every place,
A mutex waits, a counter sings,
Prometheus hears the tiny things,
Tests tuck in the garden's grace.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: extending support for tag metrics by adding a new fluentbit_input_logs_tag_records_total metric to track log records by input and tag.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch tag-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39f65fab04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-30T17:33:20Z

+    config->telemetry_metrics_tag_records_ht =
+        flb_hash_table_create(FLB_HASH_TABLE_EVICT_NONE, 512, 0);


Use a case-sensitive cardinality table for tags

When max_series is used, tags that differ only by case can bypass the limit because this new hash table is left at the default case-insensitive comparison mode while Prometheus label values remain case-sensitive. For example, after tracking tag="App" with max_series=1, tag="app" is treated as an existing key here and skips the max_series rejection path, but cmt_counter_add still creates a separate tag="app" time series. Calling flb_hash_table_set_case_sensitivity(..., FLB_TRUE) for this table would make the cardinality guard match the metric labels.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

TAG_METRICS.md (1)
97-100: ⚡ Quick win

Add language identifiers to fenced code blocks.

The fenced code blocks at lines 97-100 and 211-215 are missing language specifiers. Add bash or shell after the opening triple backticks for proper syntax highlighting.
📝 Proposed fix
-```
+```bash
 $ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus | grep tag_records
 fluentbit_input_tag_records_total{name="dummy",tag="app.logs"} 12
```diff
-```
+```bash
 cd tests/integration
 ./run_tests.py scenarios/internal_metrics      # or:
 .venv/bin/python -m pytest scenarios/internal_metrics -q
</details>

Based on learnings: static analysis tool markdownlint-cli2 flagged these blocks.


Also applies to: 211-215

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @TAG_METRICS.md around lines 97 - 100, Update the two fenced code blocks that
start with the lines "$ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus
| grep tag_records" and "cd tests/integration" to include a language specifier
(e.g., bash or shell) immediately after the opening triple backticks; replace
the opening "" with "bash" for both blocks so syntax highlighting is
enabled for the curl and test-run examples.
</details>

</blockquote></details>
<details>
<summary>src/flb_input.c (1)</summary><blockquote>

`1453-1474`: _💤 Low value_

**Counter creation condition could be refined for memory efficiency.**

The counter creation uses `ctx->telemetry_metrics_tag_records == FLB_TRUE || ins->telemetry_metrics_tag_records == FLB_TRUE`, which creates counters even when the per-input override explicitly disables the feature (`ins->telemetry_metrics_tag_records == 0`).

The runtime check in `tag_records_metrics_enabled()` (flb_input_chunk.c) correctly prioritizes the per-input override, so functionality is correct. However, for memory efficiency, the condition could mirror the runtime logic:

```c
int tag_records_enabled = (ins->telemetry_metrics_tag_records != -1)
                          ? ins->telemetry_metrics_tag_records
                          : ctx->telemetry_metrics_tag_records;
if (tag_records_enabled == FLB_TRUE) {
    // create counters
}
```

This is a minor optimization - the current code is functionally correct since unused counters are benign.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flb_input.c` around lines 1453 - 1474, The counter creation currently
checks ctx->telemetry_metrics_tag_records || ins->telemetry_metrics_tag_records
which can allocate counters even when the per-input override disables them;
change the condition to mirror tag_records_metrics_enabled() by computing an
effective flag like tag_records_enabled = (ins->telemetry_metrics_tag_records !=
-1) ? ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records
and only call cmt_counter_create for ins->cmt_tag_records and
ins->cmt_tag_records_untracked when tag_records_enabled == FLB_TRUE; keep
existing error checks (if (!ins->cmt_tag_records) return -1, etc.) and reference
the same symbols (ctx, ins, telemetry_metrics_tag_records, cmt_counter_create,
ins->cmt_tag_records_untracked) when making the change.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py:

Around line 28-40: The predicate passed to
FluentBitTestService.wait_for_condition in metrics() (and likewise in any
_run_until_value.check() usage) must catch requests exceptions so transient
network errors don't escape the poll loop; wrap the requests.get(...) call in a
try/except that catches requests.RequestException and returns None on error, and
only return response.text when status_code == 200 and expected is present—this
ensures wait_for_condition can retry instead of failing the test.

Nitpick comments:
In @src/flb_input.c:

Around line 1453-1474: The counter creation currently checks
ctx->telemetry_metrics_tag_records || ins->telemetry_metrics_tag_records which
can allocate counters even when the per-input override disables them; change the
condition to mirror tag_records_metrics_enabled() by computing an effective flag
like tag_records_enabled = (ins->telemetry_metrics_tag_records != -1) ?
ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records and only
call cmt_counter_create for ins->cmt_tag_records and
ins->cmt_tag_records_untracked when tag_records_enabled == FLB_TRUE; keep
existing error checks (if (!ins->cmt_tag_records) return -1, etc.) and reference
the same symbols (ctx, ins, telemetry_metrics_tag_records, cmt_counter_create,
ins->cmt_tag_records_untracked) when making the change.

In @TAG_METRICS.md:

Around line 97-100: Update the two fenced code blocks that start with the
lines "$ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus | grep
tag_records" and "cd tests/integration" to include a language specifier (e.g.,
bash or shell) immediately after the opening triple backticks; replace the
opening "" with "bash" for both blocks so syntax highlighting is enabled
for the curl and test-run examples.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `39e803af-8aa3-4ccd-beee-802fc6645dd6`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 19cc8f9c63ab23bb98404b76407c9c81ab2b6914 and 39f65fab049a5b409352aecf2d233224e0eb97d6.

</details>

<details>
<summary>📒 Files selected for processing (16)</summary>

* `TAG_METRICS.md`
* `include/fluent-bit/flb_config.h`
* `include/fluent-bit/flb_input.h`
* `src/config_format/flb_cf_yaml.c`
* `src/flb_config.c`
* `src/flb_input.c`
* `src/flb_input_chunk.c`
* `tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_disabled.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_enabled.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml`
* `tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml`
* `tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py`

</details>

</details>

coderabbitai · 2026-05-30T17:38:15Z

+    def metrics(self, expected=None):
+        url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"
+        return self.service.wait_for_condition(
+            lambda: response.text
+            if (
+                (response := requests.get(url, timeout=2)).status_code == 200
+                and (expected is None or expected in response.text)
+            )
+            else None,
+            timeout=10,
+            interval=0.5,
+            description="prometheus metrics",
+        )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash fd test_service.py tests/integration --exec cat {}

Repository: fluent/fluent-bit

Length of output: 3027

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find FluentBitManager implementation fd fluent_bit_manager.py . --exec cat {} # Search for how monitoring/HTTP endpoints are waited on rg -n "wait_for_http_endpoint|http_monitoring_port|/api/v2/metrics/prometheus|FLUENT_BIT_TEST_LISTENER_PORT|TEST_SUITE_HTTP_PORT" tests/integration -S # Inspect the test file for other request/poll patterns (Line ~84 etc.) cat -n tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py | sed -n '1,220p' # Find other uses of wait_for_condition to see if exceptions are typically handled in predicates rg -n "wait_for_condition\(" tests/integration/scenarios/internal_metrics -S

Repository: fluent/fluent-bit

Length of output: 49737

🏁 Script executed:

#!/bin/bash set -euo pipefail # show the configs used by this test file CONFIG_DIR="tests/integration/scenarios/internal_metrics/config" fd -e yaml "${CONFIG_DIR}" | head # print the specific configs referenced in this test module for f in tag_records_enabled.yaml tag_records_accumulate.yaml tag_records_disabled.yaml tag_records_multi_tag.yaml tag_records_multi_a.yaml tag_records_input_override.yaml tag_records_max_series.yaml tag_records_max_tag_length.yaml 2>/dev/null; do # if file exists, print a focused excerpt path="${CONFIG_DIR}/${f}" if [ -f "$path" ]; then echo "===== $path =====" wc -l "$path" cat -n "$path" | sed -n '1,220p' fi done # inspect related HTTP/metrics validation to see if any probed metrics have empty label sets cat -n tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_001.py | sed -n '1,220p'

Repository: fluent/fluent-bit

Length of output: 571

🏁 Script executed:

#!/bin/bash set -euo pipefail # List relevant YAML configs under internal_metrics fd -e yaml --full-path "tests/integration/scenarios/internal_metrics" -a | sort | sed -n '1,200p' # Show configs referenced by this test module (if they exist) BASE="tests/integration/scenarios/internal_metrics" for f in tag_records_enabled.yaml tag_records_accumulate.yaml tag_records_disabled.yaml tag_records_multi_tag.yaml tag_records_max_series.yaml tag_records_max_tag_length.yaml tag_records_input_override.yaml tag_records_multi_tag.yaml; do if [ -f "$BASE/tests/config/$f" ]; then echo "===== $BASE/tests/config/$f =====" wc -l "$BASE/tests/config/$f" cat -n "$BASE/tests/config/$f" | sed -n '1,220p' elif [ -f "$BASE/config/$f" ]; then echo "===== $BASE/config/$f =====" wc -l "$BASE/config/$f" cat -n "$BASE/config/$f" | sed -n '1,220p' else # try any matching filename under internal_metrics hit="$(fd --full-path -e yaml "$f" "$BASE" | head -n 1 || true)" if [ -n "${hit:-}" ]; then echo "===== $hit =====" wc -l "$hit" cat -n "$hit" | sed -n '1,220p' fi fi done # Inspect internal_http_server metric parsing to see if empty-label cases are handled cat -n tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_001.py | sed -n '1,260p'

Repository: fluent/fluent-bit

Length of output: 13627

🏁 Script executed:

#!/bin/bash set -euo pipefail # search for wait_for_condition predicates that call requests.get directly without guard rg -n "wait_for_condition\([\s\S]{0,200}requests\.get" tests/integration/scenarios -S # broader search: any requests.get in integration predicates rg -n "wait_for_condition\(" tests/integration/scenarios -S | head -n 200 # search specifically for 'requests.get(url' usage rg -n "requests\.get\(url" tests/integration/scenarios -S # show the FluentBitManager.wait_for_fluent_bit exception handling section sed -n '160,260p' tests/integration/src/utils/fluent_bit_manager.py

Repository: fluent/fluent-bit

Length of output: 43

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect dummy input implementation/docs to judge whether samples=1 is deterministic rg -n "dummy input|name: dummy|input_dummy|dummy.*samples|samples:" -S tests/integration src | head -n 200 # Inspect where tag_records metrics are defined to see if the metrics always have labels (no empty label sets) rg -n "tag_records(_|\\b)|fluentbit_input_tag_records|tag.records" -S src tests | head -n 200

Repository: fluent/fluent-bit

Length of output: 28623

Guard requests.get inside wait_for_condition predicates to avoid uncaught RequestException during polling

FluentBitTestService.wait_for_condition evaluates the predicate without catching exceptions, so any requests failure during polling will abort the test instead of retrying.

FluentBitManager.start() waits for the HTTP server via /api/v1/uptime (uptime_sec > 1), but the /api/v2/metrics/prometheus requests in metrics() (and _run_until_value.check()) remain unguarded.

🛡️ Suggested guard (apply to both poll sites)

def metrics(self, expected=None): url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus" + def _poll(): + try: + response = requests.get(url, timeout=2) + except requests.exceptions.RequestException: + return None + if response.status_code == 200 and (expected is None or expected in response.text): + return response.text + return None return self.service.wait_for_condition( - lambda: response.text - if ( - (response := requests.get(url, timeout=2)).status_code == 200 - and (expected is None or expected in response.text) - ) - else None, + _poll, timeout=10, interval=0.5, description="prometheus metrics", )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def metrics(self, expected=None):

url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"

return self.service.wait_for_condition(

lambda: response.text

if (

(response := requests.get(url, timeout=2)).status_code == 200

and (expected is None or expected in response.text)

)

else None,

timeout=10,

interval=0.5,

description="prometheus metrics",

)

def metrics(self, expected=None):

url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"

def _poll():

try:

response = requests.get(url, timeout=2)

except requests.exceptions.RequestException:

return None

if response.status_code == 200 and (expected is None or expected in response.text):

return response.text

return None

return self.service.wait_for_condition(

_poll,

timeout=10,

interval=0.5,

description="prometheus metrics",

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py` around lines 28 - 40, The predicate passed to FluentBitTestService.wait_for_condition in metrics() (and likewise in any _run_until_value.check() usage) must catch requests exceptions so transient network errors don't escape the poll loop; wrap the requests.get(...) call in a try/except that catches requests.RequestException and returns None on error, and only return response.text when status_code == 200 and expected is present—this ensures wait_for_condition can retry instead of failing the test.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/flb_config.c (2)

880-919: ⚡ Quick win

Move variable declarations to the start of the function.

Lines 885-886 declare ret and tmp after the function opening brace but before the switch statement at line 887. As per coding guidelines, variables should be declared at the start of functions, not mid-block.

♻️ Proposed refactor

 static int variant_as_bool(struct flb_config *config,
                            struct cfl_variant *v, int *out)
 {
+    int ret;
+    flb_sds_t tmp;
+
     switch (v->type) {
     case CFL_VARIANT_BOOL:
         *out = v->data.as_bool ? FLB_TRUE : FLB_FALSE;
         return 0;
     case CFL_VARIANT_INT:
         /* only 0/1 are valid numeric booleans (reject e.g. 2 or -1) */
         if (v->data.as_int64 != 0 && v->data.as_int64 != 1) {
             return -1;
         }
         *out = v->data.as_int64 ? FLB_TRUE : FLB_FALSE;
         return 0;
     case CFL_VARIANT_UINT:
         if (v->data.as_uint64 != 0 && v->data.as_uint64 != 1) {
             return -1;
         }
         *out = v->data.as_uint64 ? FLB_TRUE : FLB_FALSE;
         return 0;
     case CFL_VARIANT_STRING:
-        int ret;
-        flb_sds_t tmp;
         tmp = flb_env_var_translate(config->env, v->data.as_string);

As per coding guidelines.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flb_config.c` around lines 880 - 919, In function variant_as_bool move
all local variable declarations to the top of the function body: declare int
ret; and flb_sds_t tmp; immediately after the opening brace before any
statements (i.e. before the switch), so the variables are defined at function
start while preserving their types and existing usage in the CFL_VARIANT_STRING
case and return paths; no other logic changes.

921-966: ⚡ Quick win

Move variable declarations to the start of the function.

Lines 925-927 declare tmp, end, and val after the function opening brace but before the switch statement at line 929. As per coding guidelines, variables should be declared at the start of functions, not mid-block.

♻️ Proposed refactor

 static int variant_as_int(struct flb_config *config,
                           struct cfl_variant *v, int *out)
 {
+    flb_sds_t tmp;
+    char *end;
+    long val;
+
     switch (v->type) {
     case CFL_VARIANT_INT:
         if (v->data.as_int64 < INT_MIN || v->data.as_int64 > INT_MAX) {
             return -1;
         }
         *out = (int) v->data.as_int64;
         return 0;
     case CFL_VARIANT_UINT:
         if (v->data.as_uint64 > (uint64_t) INT_MAX) {
             return -1;
         }
         *out = (int) v->data.as_uint64;
         return 0;
     case CFL_VARIANT_STRING:
-        flb_sds_t tmp;
-        char *end;
-        long val;
         tmp = flb_env_var_translate(config->env, v->data.as_string);

As per coding guidelines.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flb_config.c` around lines 921 - 966, The variables tmp, end, and val in
function variant_as_int are declared after the opening brace but before the
switch; move their declarations to the top of the function (immediately after
the opening brace and before any logic) so flb_sds_t tmp; char *end; and long
val; are declared up front; keep the rest of the logic and types unchanged and
ensure any uses of tmp, end, and val remain identical.

tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py (1)

67-97: 💤 Low value

Consider iterable unpacking for command construction.

Line 75-80 uses list concatenation to prepend valgrind arguments. While this works, iterable unpacking is slightly more Pythonic:

♻️ Optional refactor

     if VALGRIND:
         vg_log = tempfile.NamedTemporaryFile(
             prefix="vg_badcfg_", suffix=".log", delete=False
         ).name
-        cmd = [
+        cmd = [
             "valgrind",
             f"--log-file={vg_log}",
             "--leak-check=full",
             "--show-leak-kinds=all",
-        ] + cmd
+            *cmd
+        ]
         timeout *= 6  # valgrind is much slower

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py`
around lines 67 - 97, The _run function builds the subprocess command using list
concatenation to prepend Valgrind args; replace that concatenation with iterable
unpacking to make the construction more Pythonic: when VALGRIND is true, create
vg_log and then set cmd = ["valgrind", f"--log-file={vg_log}",
"--leak-check=full", "--show-leak-kinds=all", *cmd] (preserve the existing
timeout *= 6 behavior and the variables cmd, VALGRIND, vg_log and the rest of
the function).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/flb_config.c`:
- Around line 375-399: The mutex config->telemetry_metrics_logs_tag_records_lock
must be initialized before any early-return path that could bypass its init
(e.g., failure of flb_hash_table_create); move the pthread_mutex_init call to
just before the flb_hash_table_create invocation (or add a boolean like
telemetry_metrics_logs_tag_records_lock_inited that you set after
pthread_mutex_init and check in flb_config_exit before calling
pthread_mutex_destroy) so flb_config_exit never calls pthread_mutex_destroy on
an uninitialized mutex.

---

Nitpick comments:
In `@src/flb_config.c`:
- Around line 880-919: In function variant_as_bool move all local variable
declarations to the top of the function body: declare int ret; and flb_sds_t
tmp; immediately after the opening brace before any statements (i.e. before the
switch), so the variables are defined at function start while preserving their
types and existing usage in the CFL_VARIANT_STRING case and return paths; no
other logic changes.
- Around line 921-966: The variables tmp, end, and val in function
variant_as_int are declared after the opening brace but before the switch; move
their declarations to the top of the function (immediately after the opening
brace and before any logic) so flb_sds_t tmp; char *end; and long val; are
declared up front; keep the rest of the logic and types unchanged and ensure any
uses of tmp, end, and val remain identical.

In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py`:
- Around line 67-97: The _run function builds the subprocess command using list
concatenation to prepend Valgrind args; replace that concatenation with iterable
unpacking to make the construction more Pythonic: when VALGRIND is true, create
vg_log and then set cmd = ["valgrind", f"--log-file={vg_log}",
"--leak-check=full", "--show-leak-kinds=all", *cmd] (preserve the existing
timeout *= 6 behavior and the variables cmd, VALGRIND, vg_log and the rest of
the function).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ba26463-dbc1-4625-9ed3-d177804c248c

📥 Commits

Reviewing files that changed from the base of the PR and between 88d7334 and 731ec88.

📒 Files selected for processing (32)

include/fluent-bit/flb_config.h
include/fluent-bit/flb_input.h
src/config_format/flb_cf_yaml.c
src/flb_config.c
src/flb_input.c
src/flb_input_chunk.c
tests/integration/scenarios/internal_metrics/config/bad_tag_records_dotted_key.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_dotted_service.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_enabled_value.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_service_only.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_value.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_malformed_metrics.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_max_series_value.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_nested_unknown.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_numeric_bool.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_telemetry_scalar.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_type.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_option.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_parent_key.yaml
tests/integration/scenarios/internal_metrics/config/lenient_tag_records_empty_block.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_disabled.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_enabled.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_disable.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_enable.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml
tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py
tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py

✅ Files skipped from review due to trivial changes (9)

tests/integration/scenarios/internal_metrics/config/lenient_tag_records_empty_block.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_malformed_metrics.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_telemetry_scalar.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_service_only.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_disable.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_nested_unknown.yaml
tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_option.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml

🚧 Files skipped from review as they are similar to previous changes (7)

tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml
tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml
src/flb_input_chunk.c
src/config_format/flb_cf_yaml.c
tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

edsiper requested a review from cosmo0920 as a code owner May 30, 2026 17:30

github-actions Bot added the docs-required label May 30, 2026

chatgpt-codex-connector Bot reviewed May 30, 2026

View reviewed changes

edsiper force-pushed the tag-metrics branch from 39f65fa to 88d7334 Compare May 30, 2026 17:36

edsiper had a problem deploying to pr May 30, 2026 17:37 — with GitHub Actions Failure

edsiper temporarily deployed to pr May 30, 2026 17:37 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

edsiper force-pushed the tag-metrics branch from 88d7334 to 731ec88 Compare May 31, 2026 05:03

edsiper temporarily deployed to pr May 31, 2026 05:03 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/flb_config.c

edsiper force-pushed the tag-metrics branch from 731ec88 to 05997bf Compare May 31, 2026 05:15

edsiper temporarily deployed to pr May 31, 2026 05:16 — with GitHub Actions Inactive

edsiper temporarily deployed to pr May 31, 2026 05:35 — with GitHub Actions Inactive

edsiper temporarily deployed to pr May 31, 2026 05:36 — with GitHub Actions Inactive

edsiper added this to the Fluent Bit v5.1 milestone May 31, 2026

edsiper added 5 commits May 31, 2026 10:20

config_format: accept nested maps in service and input sections

2f1ebee

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

config: add logs tag records telemetry metrics

1e5ded3

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

input: add per-input logs tag records metrics override

f522ce0

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

input_chunk: count input log records per tag

ce28316

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

tests: integration: add logs tag records metrics scenarios

fedc099

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

edsiper force-pushed the tag-metrics branch from 05997bf to fedc099 Compare May 31, 2026 16:24

edsiper temporarily deployed to pr May 31, 2026 16:24 — with GitHub Actions Inactive

edsiper temporarily deployed to pr May 31, 2026 16:45 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: extend support for Tag metrics#11887

metrics: extend support for Tag metrics#11887
edsiper wants to merge 5 commits into
masterfrom
tag-metrics

edsiper commented May 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		config->telemetry_metrics_tag_records_ht =
		flb_hash_table_create(FLB_HASH_TABLE_EVICT_NONE, 512, 0);

Conversation

edsiper commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Configuration

Per-input override

Metrics

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edsiper commented May 30, 2026 •

edited

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading