Skip to content

Collect ArgoCD entities for applications, cluster and repos#23917

Open
steveny91 wants to merge 10 commits into
masterfrom
sy/argocd-genresources-allow-list
Open

Collect ArgoCD entities for applications, cluster and repos#23917
steveny91 wants to merge 10 commits into
masterfrom
sy/argocd-genresources-allow-list

Conversation

@steveny91
Copy link
Copy Markdown
Contributor

@steveny91 steveny91 commented Jun 2, 2026

What does this PR do?
ArgoCD adaption of entity submission.

As part a 3 part process:
PR1 to add the forwarder: DataDog/datadog-agent#51365
PR2 to add the helper functions in checks: #23905
PR3 to onboard ArgoCD as to submit entity metadata: This PR

@datadog-prod-us1-4
Copy link
Copy Markdown
Contributor

datadog-prod-us1-4 Bot commented Jun 2, 2026

Tests  Code Coverage

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 87.99%
Overall Coverage: 87.68%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 95495d5 | Docs | Datadog PR Page | Give us feedback!

@steveny91 steveny91 changed the title Sy/argocd genresources allow list Collect ArgoCD entities for applications, cluster and repos Jun 4, 2026
@steveny91 steveny91 added the qa/skip-qa Automatically skip this PR for the next QA label Jun 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7816b0a70a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

annotation_keys=include["annotation_keys"],
)
_os.makedirs("/tmp/genres-debug", exist_ok=True)
with open("/tmp/genres-debug/payloads.jsonl", "a", encoding="utf-8") as _fp:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove the unbounded debug payload dump

When collect_genresources is enabled, this debug block runs for every fetched Application, Cluster, and Repository on every collection cycle, even if the resource is unchanged, and appends the allow-listed payload to /tmp/genres-debug/payloads.jsonl. In a real ArgoCD deployment this can leak resource metadata onto the Agent host and grow without bound over time, so the collector should not write these payloads to disk in production code.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@OliviaShoup OliviaShoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough, well-organized Beta section — clear enable steps, a concrete RBAC example, and a genuinely useful "Behavioral notes" block. "Argo CD" is spelled correctly throughout. A small cluster of mechanical must-fixes inline (word-list items, hyphenation, definition-list capitalization) plus one terminology clarity question.

A few lighter, optional suggestions:

  • ship (L170, L204) reads as jargon ("ship ... objects to Datadog" / "before they ship"); send or submit is more standard.
  • L193 — "inherit whatever request authentication is already configured" — "whatever" is a touch informal; "inherit the request authentication already configured on the instance" is cleaner.

Out-of-scope note (not editorial): the section references a new metric argocd.genresources.api.up (L208) that isn't in metadata.csv. For a Beta internal-health gauge that may be intentional, but worth confirming whether it should be documented there.

Comment thread argocd/README.md Outdated

Operator-tunable options:

- `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the first word of the definition description:

Suggested change
- `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation.
- `genresources_ttl_seconds` (default `21600`): Time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation.

Separately — worth clarifying ttl_seconds here: the option is genresources_ttl_seconds, but the sentence references ttl_seconds, which reads as a different field. If it's not an intentional distinct field, consider "Resources expire the configured number of seconds after the last observation."

Comment thread argocd/README.md Outdated
Operator-tunable options:

- `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation.
- `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the first word of the definition description:

Suggested change
- `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments.
- `genresources_collection_interval_seconds` (default `120`): Minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments.

Comment thread argocd/README.md Outdated

- `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation.
- `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments.
- `max_resources_per_cycle` (default `10000`): per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the first word of the definition description:

Suggested change
- `max_resources_per_cycle` (default `10000`): per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged.
- `max_resources_per_cycle` (default `10000`): Per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged.

Comment thread argocd/README.md Outdated
- `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation.
- `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments.
- `max_resources_per_cycle` (default `10000`): per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged.
- `extra_include_paths` (default `[]`): additional JSON paths appended to the built-in allow-list. Because only listed fields are shipped, this adds fields to the payload; it cannot remove baseline fields. Each path must resolve to a value or a list of values, not a whole object. A path that lands on an object causes that resource to be dropped.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the first word of the description, and use allowlist (one word) per the Datadog word list:

Suggested change
- `extra_include_paths` (default `[]`): additional JSON paths appended to the built-in allow-list. Because only listed fields are shipped, this adds fields to the payload; it cannot remove baseline fields. Each path must resolve to a value or a list of values, not a whole object. A path that lands on an object causes that resource to be dropped.
- `extra_include_paths` (default `[]`): Additional JSON paths appended to the built-in allowlist. Because only listed fields are shipped, this adds fields to the payload; it cannot remove baseline fields. Each path must resolve to a value or a list of values, not a whole object. A path that lands on an object causes that resource to be dropped.

Comment thread argocd/README.md Outdated

Behavioral notes:

- Only an explicit allow-list of fields is shipped for each resource type. Anything not on the list never leaves the Agent, including any field a future Argo CD release might add. Secrets such as Helm values, repository credentials, and cluster auth config are excluded by omission. Repository URLs, and any credentials embedded in free-text status messages, are stripped of `user:token@` userinfo before they ship or are used as keys.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two fixes on this line:

  • allow-listallowlist (one word, per the Datadog word list).
  • Remove the comma after "URLs" — "Repository URLs, and any credentials embedded in free-text status messages, are stripped..." wraps the second subject as if it were parenthetical. It should read "Repository URLs and any credentials embedded in free-text status messages are stripped...".

Comment thread argocd/README.md Outdated
- Only an explicit allow-list of fields is shipped for each resource type. Anything not on the list never leaves the Agent, including any field a future Argo CD release might add. Secrets such as Helm values, repository credentials, and cluster auth config are excluded by omission. Repository URLs, and any credentials embedded in free-text status messages, are stripped of `user:token@` userinfo before they ship or are used as keys.
- On every Agent restart with `collect_genresources` enabled, the collector re-emits every `Application`, `Cluster`, and `Repository` on its first cycle. The burst is bounded by `max_resources_per_cycle` (applied per type) and self-corrects on subsequent cycles.
- To limit redundant writes, a resource is re-submitted only when it changes (detected by `metadata.resourceVersion` for Applications, or a content hash for Clusters and Repositories). To keep entries from expiring, every resource is re-submitted at least once per half-TTL regardless of changes.
- Disabling `collect_genresources` does not immediately delete previously-emitted resources. Resources expire on their own via `expire_at` (default 6 hours after the last observation).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via is on the avoid list (use through/with), and previously-emitted shouldn't be hyphenated (adverbs ending in -ly are never hyphenated with the following participle):

Suggested change
- Disabling `collect_genresources` does not immediately delete previously-emitted resources. Resources expire on their own via `expire_at` (default 6 hours after the last observation).
- Disabling `collect_genresources` does not immediately delete previously emitted resources. Resources expire on their own through `expire_at` (default 6 hours after the last observation).

@steveny91
Copy link
Copy Markdown
Contributor Author

@OliviaShoup I ended up removing the readme and hiding the config for now. This is meant as a beta feature that shouldn't be available yet.

@OliviaShoup OliviaShoup self-requested a review June 4, 2026 19:50
OliviaShoup
OliviaShoup previously approved these changes Jun 4, 2026
Copy link
Copy Markdown
Contributor

@OliviaShoup OliviaShoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for tagging me and letting me know about the readme!

@steveny91
Copy link
Copy Markdown
Contributor Author

No problem I'll keep a record of the above and implement them when I'm ready to ship the readme. Thanks!

@temporal-github-worker-1 temporal-github-worker-1 Bot dismissed OliviaShoup’s stale review June 5, 2026 14:45

Review from OliviaShoup is dismissed. Related teams and files:

  • documentation
    • argocd/metadata.csv
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 5, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants