metrics: extend support for Tag metrics#11887
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds telemetry "logs tag records": nested-telemetry YAML parsing, new config and per-input fields, hash-table-backed series tracking with mutex, per-input tracked/untracked cmetrics, runtime updater during log ingestion, and comprehensive integration and bad-config tests. ChangesTag Records Telemetry Feature
Sequence Diagram(s)sequenceDiagram
participant CLI
participant YAML_Parser
participant Config
participant HashTable
participant Mutex
CLI->>YAML_Parser: Load config file
YAML_Parser->>Config: Deliver nested telemetry KVLIST
Config->>Config: validate/apply telemetry.metrics.logs.tag_records
Config->>HashTable: create telemetry tag-records HT
Config->>Mutex: initialize telemetry mutex
HashTable-->>Config: HT allocated
Mutex-->>Config: mutex ready
sequenceDiagram
participant Log_Ingest
participant Metrics_Updater
participant Tag_Validator
participant HT_Lookup
participant Cmt_Counter
Log_Ingest->>Metrics_Updater: report (input_name, tag, count)
Metrics_Updater->>Tag_Validator: check tag length
Tag_Validator-->>Metrics_Updater: OK or too_long
alt tag too long
Metrics_Updater->>Cmt_Counter: increment untracked (reason=tag_length_limit)
else length OK
Metrics_Updater->>HT_Lookup: acquire mutex & lookup/insert key
HT_Lookup-->>Metrics_Updater: found / inserted / limit_reached
alt limit reached or insert failed
Metrics_Updater->>Cmt_Counter: increment untracked (reason=max_series)
else tracked
Metrics_Updater->>Cmt_Counter: increment tracked counter (labels=name,tag)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39f65fab04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| config->telemetry_metrics_tag_records_ht = | ||
| flb_hash_table_create(FLB_HASH_TABLE_EVICT_NONE, 512, 0); |
There was a problem hiding this comment.
Use a case-sensitive cardinality table for tags
When max_series is used, tags that differ only by case can bypass the limit because this new hash table is left at the default case-insensitive comparison mode while Prometheus label values remain case-sensitive. For example, after tracking tag="App" with max_series=1, tag="app" is treated as an existing key here and skips the max_series rejection path, but cmt_counter_add still creates a separate tag="app" time series. Calling flb_hash_table_set_case_sensitivity(..., FLB_TRUE) for this table would make the cardinality guard match the metric labels.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
TAG_METRICS.md (1)
97-100: ⚡ Quick winAdd language identifiers to fenced code blocks.
The fenced code blocks at lines 97-100 and 211-215 are missing language specifiers. Add
bashorshellafter the opening triple backticks for proper syntax highlighting.📝 Proposed fix
-``` +```bash $ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus | grep tag_records fluentbit_input_tag_records_total{name="dummy",tag="app.logs"} 12```diff -``` +```bash cd tests/integration ./run_tests.py scenarios/internal_metrics # or: .venv/bin/python -m pytest scenarios/internal_metrics -q</details> Based on learnings: static analysis tool markdownlint-cli2 flagged these blocks. Also applies to: 211-215 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@TAG_METRICS.mdaround lines 97 - 100, Update the two fenced code blocks that
start with the lines "$ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus
| grep tag_records" and "cd tests/integration" to include a language specifier
(e.g., bash or shell) immediately after the opening triple backticks; replace
the opening "" with "bash" for both blocks so syntax highlighting is
enabled for the curl and test-run examples.</details> </blockquote></details> <details> <summary>src/flb_input.c (1)</summary><blockquote> `1453-1474`: _💤 Low value_ **Counter creation condition could be refined for memory efficiency.** The counter creation uses `ctx->telemetry_metrics_tag_records == FLB_TRUE || ins->telemetry_metrics_tag_records == FLB_TRUE`, which creates counters even when the per-input override explicitly disables the feature (`ins->telemetry_metrics_tag_records == 0`). The runtime check in `tag_records_metrics_enabled()` (flb_input_chunk.c) correctly prioritizes the per-input override, so functionality is correct. However, for memory efficiency, the condition could mirror the runtime logic: ```c int tag_records_enabled = (ins->telemetry_metrics_tag_records != -1) ? ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records; if (tag_records_enabled == FLB_TRUE) { // create counters } ``` This is a minor optimization - the current code is functionally correct since unused counters are benign. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/flb_input.c` around lines 1453 - 1474, The counter creation currently checks ctx->telemetry_metrics_tag_records || ins->telemetry_metrics_tag_records which can allocate counters even when the per-input override disables them; change the condition to mirror tag_records_metrics_enabled() by computing an effective flag like tag_records_enabled = (ins->telemetry_metrics_tag_records != -1) ? ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records and only call cmt_counter_create for ins->cmt_tag_records and ins->cmt_tag_records_untracked when tag_records_enabled == FLB_TRUE; keep existing error checks (if (!ins->cmt_tag_records) return -1, etc.) and reference the same symbols (ctx, ins, telemetry_metrics_tag_records, cmt_counter_create, ins->cmt_tag_records_untracked) when making the change. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.Inline comments:
In
@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py:
- Around line 28-40: The predicate passed to
FluentBitTestService.wait_for_condition in metrics() (and likewise in any
_run_until_value.check() usage) must catch requests exceptions so transient
network errors don't escape the poll loop; wrap the requests.get(...) call in a
try/except that catches requests.RequestException and returns None on error, and
only return response.text when status_code == 200 and expected is present—this
ensures wait_for_condition can retry instead of failing the test.
Nitpick comments:
In@src/flb_input.c:
- Around line 1453-1474: The counter creation currently checks
ctx->telemetry_metrics_tag_records || ins->telemetry_metrics_tag_records which
can allocate counters even when the per-input override disables them; change the
condition to mirror tag_records_metrics_enabled() by computing an effective flag
like tag_records_enabled = (ins->telemetry_metrics_tag_records != -1) ?
ins->telemetry_metrics_tag_records : ctx->telemetry_metrics_tag_records and only
call cmt_counter_create for ins->cmt_tag_records and
ins->cmt_tag_records_untracked when tag_records_enabled == FLB_TRUE; keep
existing error checks (if (!ins->cmt_tag_records) return -1, etc.) and reference
the same symbols (ctx, ins, telemetry_metrics_tag_records, cmt_counter_create,
ins->cmt_tag_records_untracked) when making the change.In
@TAG_METRICS.md:
- Around line 97-100: Update the two fenced code blocks that start with the
lines "$ curl -s http://127.0.0.1:2020/api/v2/metrics/prometheus | grep
tag_records" and "cd tests/integration" to include a language specifier (e.g.,
bash or shell) immediately after the opening triple backticks; replace the
opening "" with "bash" for both blocks so syntax highlighting is enabled
for the curl and test-run examples.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro **Run ID**: `39e803af-8aa3-4ccd-beee-802fc6645dd6` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 19cc8f9c63ab23bb98404b76407c9c81ab2b6914 and 39f65fab049a5b409352aecf2d233224e0eb97d6. </details> <details> <summary>📒 Files selected for processing (16)</summary> * `TAG_METRICS.md` * `include/fluent-bit/flb_config.h` * `include/fluent-bit/flb_input.h` * `src/config_format/flb_cf_yaml.c` * `src/flb_config.c` * `src/flb_input.c` * `src/flb_input_chunk.c` * `tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_disabled.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_enabled.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml` * `tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml` * `tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| def metrics(self, expected=None): | ||
| url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus" | ||
| return self.service.wait_for_condition( | ||
| lambda: response.text | ||
| if ( | ||
| (response := requests.get(url, timeout=2)).status_code == 200 | ||
| and (expected is None or expected in response.text) | ||
| ) | ||
| else None, | ||
| timeout=10, | ||
| interval=0.5, | ||
| description="prometheus metrics", | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
fd test_service.py tests/integration --exec cat {}Repository: fluent/fluent-bit
Length of output: 3027
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find FluentBitManager implementation
fd fluent_bit_manager.py . --exec cat {}
# Search for how monitoring/HTTP endpoints are waited on
rg -n "wait_for_http_endpoint|http_monitoring_port|/api/v2/metrics/prometheus|FLUENT_BIT_TEST_LISTENER_PORT|TEST_SUITE_HTTP_PORT" tests/integration -S
# Inspect the test file for other request/poll patterns (Line ~84 etc.)
cat -n tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py | sed -n '1,220p'
# Find other uses of wait_for_condition to see if exceptions are typically handled in predicates
rg -n "wait_for_condition\(" tests/integration/scenarios/internal_metrics -SRepository: fluent/fluent-bit
Length of output: 49737
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# show the configs used by this test file
CONFIG_DIR="tests/integration/scenarios/internal_metrics/config"
fd -e yaml "${CONFIG_DIR}" | head
# print the specific configs referenced in this test module
for f in tag_records_enabled.yaml tag_records_accumulate.yaml tag_records_disabled.yaml tag_records_multi_tag.yaml tag_records_multi_a.yaml tag_records_input_override.yaml tag_records_max_series.yaml tag_records_max_tag_length.yaml 2>/dev/null; do
# if file exists, print a focused excerpt
path="${CONFIG_DIR}/${f}"
if [ -f "$path" ]; then
echo "===== $path ====="
wc -l "$path"
cat -n "$path" | sed -n '1,220p'
fi
done
# inspect related HTTP/metrics validation to see if any probed metrics have empty label sets
cat -n tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_001.py | sed -n '1,220p'Repository: fluent/fluent-bit
Length of output: 571
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List relevant YAML configs under internal_metrics
fd -e yaml --full-path "tests/integration/scenarios/internal_metrics" -a | sort | sed -n '1,200p'
# Show configs referenced by this test module (if they exist)
BASE="tests/integration/scenarios/internal_metrics"
for f in tag_records_enabled.yaml tag_records_accumulate.yaml tag_records_disabled.yaml tag_records_multi_tag.yaml tag_records_max_series.yaml tag_records_max_tag_length.yaml tag_records_input_override.yaml tag_records_multi_tag.yaml; do
if [ -f "$BASE/tests/config/$f" ]; then
echo "===== $BASE/tests/config/$f ====="
wc -l "$BASE/tests/config/$f"
cat -n "$BASE/tests/config/$f" | sed -n '1,220p'
elif [ -f "$BASE/config/$f" ]; then
echo "===== $BASE/config/$f ====="
wc -l "$BASE/config/$f"
cat -n "$BASE/config/$f" | sed -n '1,220p'
else
# try any matching filename under internal_metrics
hit="$(fd --full-path -e yaml "$f" "$BASE" | head -n 1 || true)"
if [ -n "${hit:-}" ]; then
echo "===== $hit ====="
wc -l "$hit"
cat -n "$hit" | sed -n '1,220p'
fi
fi
done
# Inspect internal_http_server metric parsing to see if empty-label cases are handled
cat -n tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_001.py | sed -n '1,260p'Repository: fluent/fluent-bit
Length of output: 13627
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# search for wait_for_condition predicates that call requests.get directly without guard
rg -n "wait_for_condition\([\s\S]{0,200}requests\.get" tests/integration/scenarios -S
# broader search: any requests.get in integration predicates
rg -n "wait_for_condition\(" tests/integration/scenarios -S | head -n 200
# search specifically for 'requests.get(url' usage
rg -n "requests\.get\(url" tests/integration/scenarios -S
# show the FluentBitManager.wait_for_fluent_bit exception handling section
sed -n '160,260p' tests/integration/src/utils/fluent_bit_manager.pyRepository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect dummy input implementation/docs to judge whether samples=1 is deterministic
rg -n "dummy input|name: dummy|input_dummy|dummy.*samples|samples:" -S tests/integration src | head -n 200
# Inspect where tag_records metrics are defined to see if the metrics always have labels (no empty label sets)
rg -n "tag_records(_|\\b)|fluentbit_input_tag_records|tag.records" -S src tests | head -n 200Repository: fluent/fluent-bit
Length of output: 28623
Guard requests.get inside wait_for_condition predicates to avoid uncaught RequestException during polling
FluentBitTestService.wait_for_conditionevaluates the predicate without catching exceptions, so anyrequestsfailure during polling will abort the test instead of retrying.FluentBitManager.start()waits for the HTTP server via/api/v1/uptime(uptime_sec > 1), but the/api/v2/metrics/prometheusrequests inmetrics()(and_run_until_value.check()) remain unguarded.
🛡️ Suggested guard (apply to both poll sites)
def metrics(self, expected=None):
url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus"
+ def _poll():
+ try:
+ response = requests.get(url, timeout=2)
+ except requests.exceptions.RequestException:
+ return None
+ if response.status_code == 200 and (expected is None or expected in response.text):
+ return response.text
+ return None
return self.service.wait_for_condition(
- lambda: response.text
- if (
- (response := requests.get(url, timeout=2)).status_code == 200
- and (expected is None or expected in response.text)
- )
- else None,
+ _poll,
timeout=10,
interval=0.5,
description="prometheus metrics",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def metrics(self, expected=None): | |
| url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus" | |
| return self.service.wait_for_condition( | |
| lambda: response.text | |
| if ( | |
| (response := requests.get(url, timeout=2)).status_code == 200 | |
| and (expected is None or expected in response.text) | |
| ) | |
| else None, | |
| timeout=10, | |
| interval=0.5, | |
| description="prometheus metrics", | |
| ) | |
| def metrics(self, expected=None): | |
| url = f"http://127.0.0.1:{self.flb.http_monitoring_port}/api/v2/metrics/prometheus" | |
| def _poll(): | |
| try: | |
| response = requests.get(url, timeout=2) | |
| except requests.exceptions.RequestException: | |
| return None | |
| if response.status_code == 200 and (expected is None or expected in response.text): | |
| return response.text | |
| return None | |
| return self.service.wait_for_condition( | |
| _poll, | |
| timeout=10, | |
| interval=0.5, | |
| description="prometheus metrics", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py`
around lines 28 - 40, The predicate passed to
FluentBitTestService.wait_for_condition in metrics() (and likewise in any
_run_until_value.check() usage) must catch requests exceptions so transient
network errors don't escape the poll loop; wrap the requests.get(...) call in a
try/except that catches requests.RequestException and returns None on error, and
only return response.text when status_code == 200 and expected is present—this
ensures wait_for_condition can retry instead of failing the test.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/flb_config.c (2)
880-919: ⚡ Quick winMove variable declarations to the start of the function.
Lines 885-886 declare
retandtmpafter the function opening brace but before theswitchstatement at line 887. As per coding guidelines, variables should be declared at the start of functions, not mid-block.♻️ Proposed refactor
static int variant_as_bool(struct flb_config *config, struct cfl_variant *v, int *out) { + int ret; + flb_sds_t tmp; + switch (v->type) { case CFL_VARIANT_BOOL: *out = v->data.as_bool ? FLB_TRUE : FLB_FALSE; return 0; case CFL_VARIANT_INT: /* only 0/1 are valid numeric booleans (reject e.g. 2 or -1) */ if (v->data.as_int64 != 0 && v->data.as_int64 != 1) { return -1; } *out = v->data.as_int64 ? FLB_TRUE : FLB_FALSE; return 0; case CFL_VARIANT_UINT: if (v->data.as_uint64 != 0 && v->data.as_uint64 != 1) { return -1; } *out = v->data.as_uint64 ? FLB_TRUE : FLB_FALSE; return 0; case CFL_VARIANT_STRING: - int ret; - flb_sds_t tmp; tmp = flb_env_var_translate(config->env, v->data.as_string);As per coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/flb_config.c` around lines 880 - 919, In function variant_as_bool move all local variable declarations to the top of the function body: declare int ret; and flb_sds_t tmp; immediately after the opening brace before any statements (i.e. before the switch), so the variables are defined at function start while preserving their types and existing usage in the CFL_VARIANT_STRING case and return paths; no other logic changes.
921-966: ⚡ Quick winMove variable declarations to the start of the function.
Lines 925-927 declare
tmp,end, andvalafter the function opening brace but before theswitchstatement at line 929. As per coding guidelines, variables should be declared at the start of functions, not mid-block.♻️ Proposed refactor
static int variant_as_int(struct flb_config *config, struct cfl_variant *v, int *out) { + flb_sds_t tmp; + char *end; + long val; + switch (v->type) { case CFL_VARIANT_INT: if (v->data.as_int64 < INT_MIN || v->data.as_int64 > INT_MAX) { return -1; } *out = (int) v->data.as_int64; return 0; case CFL_VARIANT_UINT: if (v->data.as_uint64 > (uint64_t) INT_MAX) { return -1; } *out = (int) v->data.as_uint64; return 0; case CFL_VARIANT_STRING: - flb_sds_t tmp; - char *end; - long val; tmp = flb_env_var_translate(config->env, v->data.as_string);As per coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/flb_config.c` around lines 921 - 966, The variables tmp, end, and val in function variant_as_int are declared after the opening brace but before the switch; move their declarations to the top of the function (immediately after the opening brace and before any logic) so flb_sds_t tmp; char *end; and long val; are declared up front; keep the rest of the logic and types unchanged and ensure any uses of tmp, end, and val remain identical.tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py (1)
67-97: 💤 Low valueConsider iterable unpacking for command construction.
Line 75-80 uses list concatenation to prepend valgrind arguments. While this works, iterable unpacking is slightly more Pythonic:
♻️ Optional refactor
if VALGRIND: vg_log = tempfile.NamedTemporaryFile( prefix="vg_badcfg_", suffix=".log", delete=False ).name - cmd = [ + cmd = [ "valgrind", f"--log-file={vg_log}", "--leak-check=full", "--show-leak-kinds=all", - ] + cmd + *cmd + ] timeout *= 6 # valgrind is much slower🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py` around lines 67 - 97, The _run function builds the subprocess command using list concatenation to prepend Valgrind args; replace that concatenation with iterable unpacking to make the construction more Pythonic: when VALGRIND is true, create vg_log and then set cmd = ["valgrind", f"--log-file={vg_log}", "--leak-check=full", "--show-leak-kinds=all", *cmd] (preserve the existing timeout *= 6 behavior and the variables cmd, VALGRIND, vg_log and the rest of the function).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/flb_config.c`:
- Around line 375-399: The mutex config->telemetry_metrics_logs_tag_records_lock
must be initialized before any early-return path that could bypass its init
(e.g., failure of flb_hash_table_create); move the pthread_mutex_init call to
just before the flb_hash_table_create invocation (or add a boolean like
telemetry_metrics_logs_tag_records_lock_inited that you set after
pthread_mutex_init and check in flb_config_exit before calling
pthread_mutex_destroy) so flb_config_exit never calls pthread_mutex_destroy on
an uninitialized mutex.
---
Nitpick comments:
In `@src/flb_config.c`:
- Around line 880-919: In function variant_as_bool move all local variable
declarations to the top of the function body: declare int ret; and flb_sds_t
tmp; immediately after the opening brace before any statements (i.e. before the
switch), so the variables are defined at function start while preserving their
types and existing usage in the CFL_VARIANT_STRING case and return paths; no
other logic changes.
- Around line 921-966: The variables tmp, end, and val in function
variant_as_int are declared after the opening brace but before the switch; move
their declarations to the top of the function (immediately after the opening
brace and before any logic) so flb_sds_t tmp; char *end; and long val; are
declared up front; keep the rest of the logic and types unchanged and ensure any
uses of tmp, end, and val remain identical.
In
`@tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py`:
- Around line 67-97: The _run function builds the subprocess command using list
concatenation to prepend Valgrind args; replace that concatenation with iterable
unpacking to make the construction more Pythonic: when VALGRIND is true, create
vg_log and then set cmd = ["valgrind", f"--log-file={vg_log}",
"--leak-check=full", "--show-leak-kinds=all", *cmd] (preserve the existing
timeout *= 6 behavior and the variables cmd, VALGRIND, vg_log and the rest of
the function).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4ba26463-dbc1-4625-9ed3-d177804c248c
📒 Files selected for processing (32)
include/fluent-bit/flb_config.hinclude/fluent-bit/flb_input.hsrc/config_format/flb_cf_yaml.csrc/flb_config.csrc/flb_input.csrc/flb_input_chunk.ctests/integration/scenarios/internal_metrics/config/bad_tag_records_dotted_key.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_dotted_service.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_enabled_value.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_input_service_only.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_input_value.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_malformed_metrics.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_max_series_value.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_nested_unknown.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_numeric_bool.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_telemetry_scalar.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_type.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_option.yamltests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_parent_key.yamltests/integration/scenarios/internal_metrics/config/lenient_tag_records_empty_block.yamltests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yamltests/integration/scenarios/internal_metrics/config/tag_records_disabled.yamltests/integration/scenarios/internal_metrics/config/tag_records_enabled.yamltests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yamltests/integration/scenarios/internal_metrics/config/tag_records_input_nested_disable.yamltests/integration/scenarios/internal_metrics/config/tag_records_input_nested_enable.yamltests/integration/scenarios/internal_metrics/config/tag_records_input_override.yamltests/integration/scenarios/internal_metrics/config/tag_records_max_series.yamltests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yamltests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yamltests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.pytests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records_badconfig.py
✅ Files skipped from review due to trivial changes (9)
- tests/integration/scenarios/internal_metrics/config/lenient_tag_records_empty_block.yaml
- tests/integration/scenarios/internal_metrics/config/bad_tag_records_malformed_metrics.yaml
- tests/integration/scenarios/internal_metrics/config/bad_tag_records_telemetry_scalar.yaml
- tests/integration/scenarios/internal_metrics/config/bad_tag_records_input_service_only.yaml
- tests/integration/scenarios/internal_metrics/config/tag_records_input_nested_disable.yaml
- tests/integration/scenarios/internal_metrics/config/tag_records_max_tag_length.yaml
- tests/integration/scenarios/internal_metrics/config/bad_tag_records_nested_unknown.yaml
- tests/integration/scenarios/internal_metrics/config/bad_tag_records_unknown_option.yaml
- tests/integration/scenarios/internal_metrics/config/tag_records_input_enable.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
- tests/integration/scenarios/internal_metrics/config/tag_records_max_series.yaml
- tests/integration/scenarios/internal_metrics/config/tag_records_input_override.yaml
- tests/integration/scenarios/internal_metrics/config/tag_records_accumulate.yaml
- tests/integration/scenarios/internal_metrics/config/tag_records_multi_tag.yaml
- src/flb_input_chunk.c
- src/config_format/flb_cf_yaml.c
- tests/integration/scenarios/internal_metrics/tests/test_internal_metrics_tag_records.py
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
This PR adds
fluentbit_input_logs_tag_records_total, a new internal metric for tracking log record counts by input name and tag. The feature is disabled by default and configured exclusively through nested YAMLtelemetryblocks.It is logs-only: only the
logssignal is counted (metrics, traces and profiles inputs are ignored). Because tags can be high cardinality, the feature ships with global guardrails (max_series,max_tag_length) so it cannot overwhelm the metrics endpoint.Configuration
tag_recordsis always a map;enabledis the on/off toggle. Service-levelconfiguration controls the global default and guardrails:
Keys under the
tag_recordsblock:enabledfalsemax_series500(name, tag)series tracked globally.<= 0disables the limit.max_tag_length128<= 0disables the limit.Per-input override
Inputs can override only the
enabledstate (max_series/max_tag_lengthare service-global). Opt an input out while the service switch is on:Or opt a single input in while the service switch is off:
Metrics
Example Prometheus output when enabled:
Records that cannot be tracked because of the guardrails are counted separately, with the reason:
reasonis one ofmax_series(global series cap reached),tag_length_limit(tag exceedsmax_tag_length), orerror(internal failure). A record that is not tracked per-tag is still counted influentbit_input_records_total.nameis the input alias when set, otherwise the input plugin name.tagis the Fluent Bit tag for the ingested log records.Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Tests