Skip to content

in_tail: fix ignore_active_older_files to re-check mtime on aged-out files#11890

Open
zshuang0316 wants to merge 2 commits into
fluent:masterfrom
zshuang0316:fix-ignore-active-older-files
Open

in_tail: fix ignore_active_older_files to re-check mtime on aged-out files#11890
zshuang0316 wants to merge 2 commits into
fluent:masterfrom
zshuang0316:fix-ignore-active-older-files

Conversation

@zshuang0316

@zshuang0316 zshuang0316 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Fix a bug where ignore_active_older_files permanently excluded files
whose mtime was refreshed by a new write.

When a file's mtime exceeded ignore_older while being actively
monitored, its inode was registered as aged-out and the file was removed
from monitoring. If new content was later appended (updating mtime), the
scan still excluded the file because the inode comparison matched without
re-checking the mtime.

Re-evaluate mtime in the aged-out inode branch; only skip if the file is
still too old, otherwise unregister the aged-out entry and allow
re-pickup. Applied to both tail_scan_glob.c (Linux/macOS) and
tail_scan_win32.c (Windows).


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [N/A] Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Re-check file modification time for aged-out entries so files updated after aging can be re-processed.
    • Stop treating aged-out inodes as permanently excluded; allow re-registration when refreshed.
    • Preserve a file’s read position when purged for being old so a later re-scan resumes from the previous offset.
  • Tests

    • Added a runtime test verifying aged-out files are re-registered and read again after an update.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Re-check aged-out files' mtimes on scan (POSIX and Windows), preserve file read offsets when purging ignored files, and add a runtime test that verifies a refreshed file is re-registered and re-read.

Changes

Aged-out file re-evaluation

Layer / File(s) Summary
Aged-out file re-check on both platforms
plugins/in_tail/tail_scan_glob.c, plugins/in_tail/tail_scan_win32.c
When an inode matches an aged-out entry under ignore_active_older_files, the code re-computes the file's current mtime and only skips the file if it remains older than ignore_older. If mtime is refreshed, the aged-out inode entry is unregistered so the file can be re-registered.
Preserve ignored-file offset on purge
plugins/in_tail/tail_file.c
Before removing a file due to ignore_older with ignore_active_older_files, the plugin records the file's current read offset via flb_tail_scan_register_ignored_file_size(...) so a later re-scan can resume from that offset.
Test validation for re-read on update
tests/runtime/in_tail.c
Adds flb_test_in_tail_ignore_active_older_files_reread_on_update() and registers it in TEST_LIST. The test ages a file out, writes new data to refresh its mtime, and asserts the tail input reads both records (count == 2).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

backport to v4.2.x

Suggested reviewers

  • edsiper
  • cosmo0920

Poem

🐰 I nibbled through old tails at rest,
A fresh write nudged them from their nest,
Offsets tucked safely, scans wake and peep,
Two lines counted now — no longer asleep.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main change: fixing the ignore_active_older_files logic to re-check mtime on aged-out files, allowing them to be reprocessed when updated.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 299b0f1665

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_tail/tail_scan_glob.c

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_tail/tail_file.c (1)

447-450: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Confirm flb_tail_scan_register_ignored_file_size API and offset handling.

  • flb_tail_scan_register_ignored_file_size exists and is declared/defined as void flb_tail_scan_register_ignored_file_size(struct flb_tail_config *ctx, const char *path, size_t path_length, size_t size) (plugins/in_tail/tail_scan.h, plugins/in_tail/tail_scan.c); the call in plugins/in_tail/tail_file.c passes the first three arguments (ctx, file->name, strlen(file->name)) correctly.
  • The last argument is file->offset, which is int64_t in struct flb_tail_file; since the API expects size_t, ensure file->offset is always non-negative (or add an explicit cast/guard) to avoid sign-to-unsigned conversion issues.
  • The call order in the purge path is correct: register ignored offset first, then register the aged-out inode, then remove the file.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/in_tail/tail_file.c` around lines 447 - 450, The call to
flb_tail_scan_register_ignored_file_size(ctx, file->name, strlen(file->name),
file->offset) uses file->offset (int64_t) while the API expects a size_t; update
the purge path in tail_file.c to ensure file->offset is non-negative before
passing it (e.g. check file->offset >= 0 and use (size_t)file->offset) or clamp
to 0 to avoid sign-to-unsigned conversion, keeping the existing call order
(flb_tail_scan_register_ignored_file_size first, then
flb_tail_scan_register_aged_out_inode, then flb_tail_file_remove).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/runtime/in_tail.c`:
- Around line 2094-2106: After calling write_msg, explicitly wait until the
first record is observed before starting the aging sleep: use the existing wait
helper to assert that num_output == 1 (and if the assert fails, call
test_tail_ctx_destroy(ctx) and return) prior to the flb_time_msleep(4000) call
so the purge window only begins after the first record is consumed; place this
check between write_msg(...) and flb_time_msleep(...) to eliminate the
timing-dependent flake.

---

Outside diff comments:
In `@plugins/in_tail/tail_file.c`:
- Around line 447-450: The call to flb_tail_scan_register_ignored_file_size(ctx,
file->name, strlen(file->name), file->offset) uses file->offset (int64_t) while
the API expects a size_t; update the purge path in tail_file.c to ensure
file->offset is non-negative before passing it (e.g. check file->offset >= 0 and
use (size_t)file->offset) or clamp to 0 to avoid sign-to-unsigned conversion,
keeping the existing call order (flb_tail_scan_register_ignored_file_size first,
then flb_tail_scan_register_aged_out_inode, then flb_tail_file_remove).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70821c4d-6bc9-49f7-bea0-51eb3616eb3f

📥 Commits

Reviewing files that changed from the base of the PR and between 299b0f1 and 5ca8e29.

📒 Files selected for processing (4)
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_scan_glob.c
  • plugins/in_tail/tail_scan_win32.c
  • tests/runtime/in_tail.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/in_tail/tail_scan_win32.c
  • plugins/in_tail/tail_scan_glob.c

Comment thread tests/runtime/in_tail.c

@cosmo0920 cosmo0920 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really worthy patch! But I found a small nitpick about readability on unit tests.
Plus, we need to address a codex reviewed comment.
Could you address them?

Comment thread tests/runtime/in_tail.c Outdated
Comment thread plugins/in_tail/tail_scan_glob.c

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27846b1d85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_tail/tail_file.c Outdated
flb_tail_scan_register_ignored_file_size(ctx,
file->name,
strlen(file->name),
file->offset);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clamp preserved offsets after in-place truncation

When a monitored file ages out and is later copy-truncated or rewritten in place before being appended again, this stores the old offset even though the same inode can now have a smaller st_size. On re-pickup without a DB, set_file_position() seeks to that stale offset past EOF and pending_bytes stays non-positive, so the fresh content is skipped until the file grows past the old size; discard or clamp the saved offset when the refreshed same-inode file is smaller than the preserved offset.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edf79beb06

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_tail/tail_file.c Outdated
Comment on lines +2443 to +2446
flb_tail_scan_register_ignored_file_size(ctx,
file->name,
strlen(file->name),
file->offset);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Seek to saved offset when re-picking aged files

When read_newly_discovered_files_from_head is left at its default true, in_tail_init() flips ctx->read_from_head to true after startup, and set_file_position() then returns before seeking whenever ctx->read_from_head == FLB_TRUE. In that common configuration, this newly saved offset is not applied when the refreshed aged-out file is appended again, so the fd remains at 0 while file->offset is nonzero; the next static read replays old bytes instead of only the appended content. The new test misses this because it explicitly sets read_newly_discovered_files_from_head=false.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 588b0d6cb8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_tail/tail_file.c Outdated
Comment on lines +2450 to +2454
if (file->offset > 0) {
flb_tail_scan_register_ignored_file_size(ctx,
file->name,
strlen(file->name),
file->offset);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the last parsed offset when aging out

When an active file has bytes buffered that have been read but not emitted yet (for example a partial line without a trailing newline), file->offset already points past those bytes while file->stream_offset/the resumable marker still points to the last parsed record. Saving file->offset here means the later re-pickup seeks past the buffered fragment once the file's mtime is refreshed, so appending the newline or rest of that record drops the beginning of the log line. Store the resumable/processed offset instead of the raw read offset for this re-registration path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6d80da657

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_tail/tail_file.c Outdated
Comment on lines +2452 to +2456
if (file->offset > 0) {
flb_tail_scan_register_ignored_file_size(ctx,
file->name,
strlen(file->name),
flb_tail_file_db_offset(file));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve zero offsets when re-picking aged-out files

When ignore_active_older_files purges a monitored file whose current resumable offset is 0, this branch does not store any offset. In a tail-from-end configuration (read_newly_discovered_files_from_head=false), an empty file can age out while monitored, then receive its first line; the refresh scan re-adds it with no saved offset, so set_file_position() seeks to EOF and drops that first post-refresh line. The same happens when flb_tail_file_db_offset(file) is 0 because the whole buffered fragment is unparsed, so the line that completes it after re-pickup is skipped instead of reconstructed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d171a2efe0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_tail/tail_file.c Outdated
/* A pre-set offset (e.g. from an aged-out re-pickup) must be honoured
* even when read_from_head is true; the flag only governs truly new
* files that have no prior read position. */
if (file->offset > 0) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor saved zero offsets on re-pickup

When read_newly_discovered_files_from_head=false, an empty (or otherwise unconsumed) file can age out with a saved offset of 0; after a later write refreshes mtime, flb_tail_file_append() receives that saved 0, but this check treats it the same as “no saved offset” and falls through to the SEEK_END path, so the new first record is skipped. The new ignored_file_sizes +1 encoding makes zero offsets fetchable, but set_file_position() still needs a way to distinguish an explicit saved zero from a truly new file.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

…files

When a file is aged out by ignore_active_older_files, register its inode
and read offset so the scan can resume from the saved position if the
file's mtime is later refreshed by new writes.

On re-scan, re-evaluate the mtime of same-inode aged-out files: only
keep skipping them if they are still too old. Clear the aged-out inode
entry (and any stale saved offset) when the inode at the path has
changed, so a replacement file is not mistakenly skipped.

Save flb_tail_file_db_offset() rather than file->offset when aging out,
so any buffered but unparsed bytes are not skipped on re-pickup. Store
size+1 in ignored_file_sizes and subtract 1 on fetch so that an offset
of zero is distinguishable from a missing entry, allowing files aged out
at offset 0 to be re-read from the start on re-pickup.

Discard the saved offset on re-pickup if it exceeds the current file
size, which can happen after an in-place copy-truncate rotation with
the same inode.

In set_file_position, add an explicit_offset parameter and honour it
before consulting the read_from_head flag. The flag governs truly new
files only and must not suppress seeking to a known position on
aged-out re-pickup, including an explicit offset of zero.

Signed-off-by: zshuang0316 <zshuang0316@users.noreply.github.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>
Add two runtime tests that verify a file aged out by
ignore_active_older_files is re-added to monitoring when new content
updates its mtime, and that reading resumes from the saved offset:

- reread_on_update: uses read_newly_discovered_files_from_head=false
- reread_on_update_default_read_from_head: uses the default (true),
  verifying set_file_position honours the saved offset even when
  ctx->read_from_head is true

Signed-off-by: zshuang0316 <zshuang0316@users.noreply.github.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants