Collect ArgoCD entities for applications, cluster and repos#23917
Collect ArgoCD entities for applications, cluster and repos#23917steveny91 wants to merge 10 commits into
Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 95495d5 | Docs | Datadog PR Page | Give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7816b0a70a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| annotation_keys=include["annotation_keys"], | ||
| ) | ||
| _os.makedirs("/tmp/genres-debug", exist_ok=True) | ||
| with open("/tmp/genres-debug/payloads.jsonl", "a", encoding="utf-8") as _fp: |
There was a problem hiding this comment.
Remove the unbounded debug payload dump
When collect_genresources is enabled, this debug block runs for every fetched Application, Cluster, and Repository on every collection cycle, even if the resource is unchanged, and appends the allow-listed payload to /tmp/genres-debug/payloads.jsonl. In a real ArgoCD deployment this can leak resource metadata onto the Agent host and grow without bound over time, so the collector should not write these payloads to disk in production code.
Useful? React with 👍 / 👎.
OliviaShoup
left a comment
There was a problem hiding this comment.
Thorough, well-organized Beta section — clear enable steps, a concrete RBAC example, and a genuinely useful "Behavioral notes" block. "Argo CD" is spelled correctly throughout. A small cluster of mechanical must-fixes inline (word-list items, hyphenation, definition-list capitalization) plus one terminology clarity question.
A few lighter, optional suggestions:
ship(L170, L204) reads as jargon ("ship ... objects to Datadog" / "before they ship");sendorsubmitis more standard.- L193 — "inherit whatever request authentication is already configured" — "whatever" is a touch informal; "inherit the request authentication already configured on the instance" is cleaner.
Out-of-scope note (not editorial): the section references a new metric argocd.genresources.api.up (L208) that isn't in metadata.csv. For a Beta internal-health gauge that may be intentional, but worth confirming whether it should be documented there.
|
|
||
| Operator-tunable options: | ||
|
|
||
| - `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation. |
There was a problem hiding this comment.
Capitalize the first word of the definition description:
| - `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation. | |
| - `genresources_ttl_seconds` (default `21600`): Time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation. |
Separately — worth clarifying ttl_seconds here: the option is genresources_ttl_seconds, but the sentence references ttl_seconds, which reads as a different field. If it's not an intentional distinct field, consider "Resources expire the configured number of seconds after the last observation."
| Operator-tunable options: | ||
|
|
||
| - `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation. | ||
| - `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments. |
There was a problem hiding this comment.
Capitalize the first word of the definition description:
| - `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments. | |
| - `genresources_collection_interval_seconds` (default `120`): Minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments. |
|
|
||
| - `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation. | ||
| - `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments. | ||
| - `max_resources_per_cycle` (default `10000`): per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged. |
There was a problem hiding this comment.
Capitalize the first word of the definition description:
| - `max_resources_per_cycle` (default `10000`): per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged. | |
| - `max_resources_per_cycle` (default `10000`): Per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged. |
| - `genresources_ttl_seconds` (default `21600`): time-to-live applied to each emitted resource. Resources expire `ttl_seconds` after the last observation. | ||
| - `genresources_collection_interval_seconds` (default `120`): minimum seconds between collection cycles. The collector polls the Argo CD API at most once per interval, independent of the metrics scrape frequency, to limit load on the Argo CD API server in large deployments. | ||
| - `max_resources_per_cycle` (default `10000`): per-cycle cap, applied independently to each resource type. When an Argo CD API endpoint returns more, the excess is dropped and a warning is logged. | ||
| - `extra_include_paths` (default `[]`): additional JSON paths appended to the built-in allow-list. Because only listed fields are shipped, this adds fields to the payload; it cannot remove baseline fields. Each path must resolve to a value or a list of values, not a whole object. A path that lands on an object causes that resource to be dropped. |
There was a problem hiding this comment.
Capitalize the first word of the description, and use allowlist (one word) per the Datadog word list:
| - `extra_include_paths` (default `[]`): additional JSON paths appended to the built-in allow-list. Because only listed fields are shipped, this adds fields to the payload; it cannot remove baseline fields. Each path must resolve to a value or a list of values, not a whole object. A path that lands on an object causes that resource to be dropped. | |
| - `extra_include_paths` (default `[]`): Additional JSON paths appended to the built-in allowlist. Because only listed fields are shipped, this adds fields to the payload; it cannot remove baseline fields. Each path must resolve to a value or a list of values, not a whole object. A path that lands on an object causes that resource to be dropped. |
|
|
||
| Behavioral notes: | ||
|
|
||
| - Only an explicit allow-list of fields is shipped for each resource type. Anything not on the list never leaves the Agent, including any field a future Argo CD release might add. Secrets such as Helm values, repository credentials, and cluster auth config are excluded by omission. Repository URLs, and any credentials embedded in free-text status messages, are stripped of `user:token@` userinfo before they ship or are used as keys. |
There was a problem hiding this comment.
Two fixes on this line:
allow-list→allowlist(one word, per the Datadog word list).- Remove the comma after "URLs" — "Repository URLs, and any credentials embedded in free-text status messages, are stripped..." wraps the second subject as if it were parenthetical. It should read "Repository URLs and any credentials embedded in free-text status messages are stripped...".
| - Only an explicit allow-list of fields is shipped for each resource type. Anything not on the list never leaves the Agent, including any field a future Argo CD release might add. Secrets such as Helm values, repository credentials, and cluster auth config are excluded by omission. Repository URLs, and any credentials embedded in free-text status messages, are stripped of `user:token@` userinfo before they ship or are used as keys. | ||
| - On every Agent restart with `collect_genresources` enabled, the collector re-emits every `Application`, `Cluster`, and `Repository` on its first cycle. The burst is bounded by `max_resources_per_cycle` (applied per type) and self-corrects on subsequent cycles. | ||
| - To limit redundant writes, a resource is re-submitted only when it changes (detected by `metadata.resourceVersion` for Applications, or a content hash for Clusters and Repositories). To keep entries from expiring, every resource is re-submitted at least once per half-TTL regardless of changes. | ||
| - Disabling `collect_genresources` does not immediately delete previously-emitted resources. Resources expire on their own via `expire_at` (default 6 hours after the last observation). |
There was a problem hiding this comment.
via is on the avoid list (use through/with), and previously-emitted shouldn't be hyphenated (adverbs ending in -ly are never hyphenated with the following participle):
| - Disabling `collect_genresources` does not immediately delete previously-emitted resources. Resources expire on their own via `expire_at` (default 6 hours after the last observation). | |
| - Disabling `collect_genresources` does not immediately delete previously emitted resources. Resources expire on their own through `expire_at` (default 6 hours after the last observation). |
|
@OliviaShoup I ended up removing the readme and hiding the config for now. This is meant as a beta feature that shouldn't be available yet. |
OliviaShoup
left a comment
There was a problem hiding this comment.
thanks for tagging me and letting me know about the readme!
|
No problem I'll keep a record of the above and implement them when I'm ready to ship the readme. Thanks! |
Review from OliviaShoup is dismissed. Related teams and files:
- documentation
- argocd/metadata.csv
Validation ReportAll 21 validations passed. Show details
|
What does this PR do?
ArgoCD adaption of entity submission.
As part a 3 part process:
PR1 to add the forwarder: DataDog/datadog-agent#51365
PR2 to add the helper functions in checks: #23905
PR3 to onboard ArgoCD as to submit entity metadata: This PR