Skip to content

Fix TailFile CRC file descriptor leak#23904

Draft
ArmentaRoberto wants to merge 2 commits into
masterfrom
ra/fix-tailfile-crc-fd-leak
Draft

Fix TailFile CRC file descriptor leak#23904
ArmentaRoberto wants to merge 2 commits into
masterfrom
ra/fix-tailfile-crc-fd-leak

Conversation

@ArmentaRoberto
Copy link
Copy Markdown

What does this PR do?

Fixes a file descriptor leak in TailFile._open_file by closing the temporary file handle used to compute the CRC for rotation/truncation detection before opening the persistent tail handle.

Adds a regression test that tracks the CRC probe handle and verifies it is closed while the active tail handle remains open.

Motivation

TailFile._open_file opened one file handle to read the CRC prefix, then opened a second handle for tailing. The CRC handle was never closed, so every reopen path could leak a descriptor. On long-running Agents tailing files through rotations, truncations, or periodic reopens, this can eventually exhaust ulimit -n and cause unrelated I/O to fail with too many open files.

Validation:

  • ddev -x test -fs datadog_checks_base
  • ddev -x --no-interactive test datadog_checks_base -- -k tailfile
  • ddev -x --no-interactive test datadog_checks_base reached 1475 passed, 40 skipped, then errored in Docker-backed Kerberos/SOCKS5 tests because local Docker is unavailable (docker points at a missing Colima socket and docker compose is not available in this environment).

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented Jun 2, 2026

Pipelines  Tests  Code Coverage

Fix all issues with BitsAI

⚠️ Warnings

🚦 18 Pipeline jobs failed

PR All Windows | test / j6712d43 / ddev on Windows   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). AttributeError: module 'aiohttp.streams' has no attribute 'AsyncStreamReaderMixin'

PR All | test / jd316aba / ddev on Linux   View in Datadog   GitHub Actions

⬆️ Not caused by your changes. Rebase on a base branch once a fix is merged. AttributeError: module 'aiohttp.streams' has no attribute 'AsyncStreamReaderMixin' in /home/runner/.local/share/hatch/env/virtual/ddev/SZM48zvK/ddev/lib/python3.13/site-packages/vcr/stubs/aiohttp_stubs.py:21

PR All | test / j06ca546 / SNMP   View in Datadog   GitHub Actions

🔄 Retry job. This looks flaky and may succeed on retry. ConnectionError: HTTPSConnectionPool(host='ddintegrations.blob.core.windows.net', port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError)

View all 18 failed jobs.

🧪 20 Tests failed in 1 job

PR All | run   GitHub Actions

test_bulk_table from test_check.py   View in Datadog (Fix with Cursor)
HTTPSConnectionPool(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError(&#34;HTTPSConnection(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Failed to resolve &#39;ddintegrations.blob.core.windows.net&#39; ([Errno -2] Name or service not known)&#34;))
test_cast_metrics from test_check.py   View in Datadog (Fix with Cursor)
HTTPSConnectionPool(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError(&#34;HTTPSConnection(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Failed to resolve &#39;ddintegrations.blob.core.windows.net&#39; ([Errno -2] Name or service not known)&#34;))

View all 20 test failures

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 96.30%
Overall Coverage: 88.29% (+0.33%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 502500d | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 2, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant