Skip to content

fix: Add etag diff suppression to remaining resources#3311

Open
ei-grad wants to merge 6 commits into
integrations:mainfrom
ei-grad:fix-etag-drift-remaining-resources
Open

fix: Add etag diff suppression to remaining resources#3311
ei-grad wants to merge 6 commits into
integrations:mainfrom
ei-grad:fix-etag-drift-remaining-resources

Conversation

@ei-grad

@ei-grad ei-grad commented Mar 30, 2026

Copy link
Copy Markdown

Summary

PR #2840 added DiffSuppressFunc and DiffSuppressOnRefresh to etag fields in 7 resources to prevent spurious diffs on plan/apply. This PR applies the same fix to the remaining 22 resources that were missed:

  • github_actions_runner_group
  • github_branch_protection_v3
  • github_emu_group_mapping
  • github_enterprise_actions_runner_group
  • github_issue
  • github_membership
  • github_organization_project
  • github_organization_ruleset
  • github_organization_webhook
  • github_project_card
  • github_project_column
  • github_release
  • github_repository_autolink_reference
  • github_repository_deploy_key
  • github_repository_ruleset
  • github_team
  • github_team_membership
  • github_team_repository
  • github_team_sync_group_mapping
  • github_user_gpg_key
  • github_user_ssh_key
  • organization_block

The existing TestEtagSchemaConsistency unit test is extended to cover all 29 resources (7 existing + 22 new).

Fixes #3103
Fixes #3291

… fields

PR integrations#2840 added etag diff suppression to 7 resources but missed the
remaining 22. This causes spurious diffs on every plan/apply for
resources like github_team, github_membership, github_team_repository,
github_repository_deploy_key, and others.

Apply the same pattern (Optional + DiffSuppressFunc returning true +
DiffSuppressOnRefresh) to all remaining resource etag schema fields
and extend the unit test to cover all 29 resources.

Fixes integrations#3103
Fixes integrations#3291

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

👋 Hi! Thank you for this contribution! Just to let you know, our GitHub SDK team does a round of issue and PR reviews twice a week, every Monday and Friday! We have a process in place for prioritizing and responding to your input. Because you are a part of this community please feel free to comment, add to, or pick up any issues/PRs that are labeled with Status: Up for grabs. You & others like you are the reason all of this works! So thank you & happy coding! 🚀

@github-actions github-actions Bot added the Type: Bug Something isn't working as documented label Mar 30, 2026
},
"etag": {
Type: schema.TypeString,
Optional: true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: setting Optional makes them user editable, we don't want that. Please remove from all resources

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — removed Optional: true from all 29 etag fields (including the 7 from #2840 that also had it) and updated the unit test accordingly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, this is covered in original PR:

Lastly these fields have to be marked as optional=true - [SDK ref](https://github.com/integrations/terraform-provider-github/blob/16872b724254fdddc3441c713b087cb4d7005f83/vendor/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema/schema.go#L841)

Computed-only fields: Computed: true, Optional: false → Can't have DiffSuppressFunc
Optional computed fields: Computed: true, Optional: true → Can have DiffSuppressFunc

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests that confirm this SDK constraint. TestEtagComputedOnlyRejectsSuppress shows InternalValidate rejects DiffSuppressFunc on Computed-only fields with "There is no config for computed-only field, nothing to compare." TestEtagSchemaConsistency now also runs InternalValidate on all 29 resources to ensure they pass.

Restored Optional: true on all etag fields (including the ones from #2840 that I removed in the previous push).

@ei-grad ei-grad Apr 1, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the standalone SDK validation tests — TestEtagSchemaConsistency now runs InternalValidate on its resources, which covers the same check.

ei-grad and others added 3 commits April 1, 2026 09:55
etag is server-computed and should not be user-editable. Remove
Optional: true from all etag schema fields (both the 22 new ones
and the 7 from PR integrations#2840) and update the unit test accordingly.

Co-Authored-By: Claude <noreply@anthropic.com>
…sFunc

The SDK explicitly rejects DiffSuppressFunc on Computed-only fields
(InternalValidate returns: "DiffSuppressFunc is for suppressing
differences between config and state representation. There is no
config for computed-only field, nothing to compare.").

Add tests demonstrating this SDK constraint, and verify all 29
resources pass InternalValidate with their current schema.

Co-Authored-By: Claude <noreply@anthropic.com>
…ssFunc

The terraform-plugin-sdk v2 InternalValidate rejects DiffSuppressFunc
on Computed-only fields with: "There is no config for computed-only
field, nothing to compare." Optional: true is required to use
DiffSuppressFunc, as demonstrated by the new tests.

Co-Authored-By: Claude <noreply@anthropic.com>
@ei-grad

ei-grad commented Apr 1, 2026

Copy link
Copy Markdown
Author

I don't like that it is copy-pasted everywhere, and that there is no dedicated test which performs InternalValidate for all resources (not just etag'ed-ones). But it feels like these should be handled in separate PRs.

…date in consistency test

Co-Authored-By: Claude <noreply@anthropic.com>
resourceName: resource,
},
}
if err := p.InternalValidate(); err != nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: this is meant for provider validation and not resource, what is the benefit of adding it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To catch invalid resource schema generated by provider.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth be moved to a separate test.

@pdewilde

Copy link
Copy Markdown

It think this should also fix #3085 if you want to add it to the pr description

@mtb-xt

mtb-xt commented May 4, 2026

Copy link
Copy Markdown

Hey team — I wanted to report a real downstream side effect of this issue and explain how it affects us in practice.

In our CI/CD pipelines, we compare Terraform/OpenTofu planfiles to make sure that the plan approved during PR review is the same plan that gets applied later.

The intent is simple:

The thing we reviewed is the thing we deploy.

To enforce that, we use binary plan comparison through Cloud Posse Atmos:

atmos terraform plan-diff

In our case, this started failing even though Terraform/OpenTofu reported both plans as no-op:

No changes. Your infrastructure matches the configuration.

After converting both planfiles to JSON with:

tofu show -json

…the only meaningful differences were refreshed provider-managed etag values, plus the top-level plan timestamp.

Example:

- "etag": "W/\"9322e152a70cc7591e103fdfe6c5eadb40fbd97f0cde31b018e49ce21f237128\""
+ "etag": "W/\"58d5b901a79908580c54d5f50aa4d4d1e6eb8710448deff8d453c6c17956ca3e\""

Environment where we observed this:

  • Provider: integrations/github 6.12.1
  • OpenTofu: 1.11.5
  • Atmos: 1.213.0
  • Observed affected resource: github_repository_ruleset

This means a no-op plan is not reproducible. The provider is persisting volatile GitHub API transport/cache metadata into state/plan output, which makes otherwise identical plans compare as different.

That breaks workflows which rely on plan stability, including:

  • comparing PR-time plans with apply-time plans
  • verifying that reviewed infrastructure changes are exactly what gets deployed
  • CI drift/gating workflows based on plan equality

lifecycle.ignore_changes = [etag] does not help here, because this is provider-managed state, not user-controlled configuration.

This PR appears to address the same class of issue we are seeing, including for github_repository_ruleset.

Could a maintainer please either merge this PR, or provide clear direction on what still needs to be changed for it to be accepted? @deiga

Related downstream report with more context:

https://github.com/orgs/cloudposse/discussions/125

Thanks.

@mtb-xt

mtb-xt commented Jun 4, 2026

Copy link
Copy Markdown

Since this has been quiet for a while and the branch is now 64 commits behind main, I've rebased it (all commits keep @ei-grad's authorship, review feedback already applied) and opened #3474 to keep it moving — this bug breaks plan-comparison CI gates for us in production. @ei-grad if you'd rather take it back over, say the word and I'll close mine. @deiga both of your open review questions are answered in the new PR's description.

@ei-grad

ei-grad commented Jun 15, 2026

Copy link
Copy Markdown
Author

Clarification on the status of the in-state etag field

@deiga @stevehipwell — before this moves forward I'd like to align on the intended role of the per-resource etag attribute, since #3448 (now merged) introduced a parallel ETag mechanism and lists "Resolves #3103" — the same drift issue this PR targets.

How I read main today:

  • legacy_client defaults to true, so the legacy client is still active by default: the per-resource etag is sent as If-None-Match (via ctxEtag / github/transport.go) and written to state with d.Set("etag", resp.Header.Get("ETag")).
  • With legacy_client = false, conditional caching moves to the transport/bbolt layer (github-conditional-http-transport), so the per-resource etag is no longer needed for conditional requests — but it's still written to state by every resource's Read.

In both modes the volatile ETag lands in state, which is what produces the spurious no-op plan diffs. lifecycle.ignore_changes can't help since it's provider-managed.

One more data point relevant to the rate-limit tradeoff raised earlier: @stevehipwell noted on #3103 that the etag pattern only works with a fixed token — when the token changes the etag is invalid. We authenticate with github_app, where installation tokens rotate (~hourly), so under the legacy client the conditional requests are invalidated on every rotation and fall back to full 200 responses. That means app-auth users take on the state-drift cost with little of the rate-limit benefit that motivates keeping the field.

Questions:

  1. Is the in-state etag field still part of the intended design under the new client, or is it now vestigial and slated for removal?
  2. If it stays, is DiffSuppressFunc (this PR, matching the 7 resources from feat: Adds DiffSuppressFunc and DiffSuppressOnRefresh to resources that have etag properties to suppress etag-related diffs #2840) the agreed fix for the remaining 22 resources?
  3. If it's slated for removal under the new client, does feat: Add new gh client implementation #3448's "Resolves [MAINT]: Logic flaws with etag support #3103" already cover legacy_client = false users — leaving this PR as the fix for legacy-client users — or should both be addressed together?

I'm happy to adapt either way; I just want the direction confirmed so this doesn't stall again. For the record, the two earlier review threads here are resolved: Optional: true is required by the SDK for DiffSuppressFunc on a computed field (InternalValidate rejects it otherwise), and the InternalValidate call in TestEtagSchemaConsistency guards exactly that constraint.


A note on #3474 for anyone reading both threads: this PR had gone quiet for a while, that's true — but the only flag was "behind main", and with no conflicts that isn't a merge blocker in itself; this could have been merged as-is. I've since merged main here anyway to clear even the cosmetic flag. The identical changes now exist under two PR numbers. And the downstream plan-diff / plan-equality impact reported in this thread is a genuinely useful data point that tends to get tracked and triaged better as its own issue — linkable from the relevant PRs — rather than living inside a PR thread.


Prepared with LM assistance — Claude Code

@ei-grad

ei-grad commented Jun 15, 2026

Copy link
Copy Markdown
Author

Verification: reproduced the drift and confirmed this PR fixes it — without regressing the 304 optimization

I built two provider binaries from the same tree differing only by this PR's commits — baseline = the merge-base on main (no DiffSuppressFunc on these resources), treatment = this PR — and ran them via Terraform dev_overrides against a single-resource local workspace (a real, imported github_membership and github_team; import + plan only, no mutation). Tested under both github_app (rotating installation token) and a fixed PAT.

1. Reproduced the planfile churn (baseline). Seeding state with a stale etag (what a rotated-token refresh produces), terraform show -json on the baseline build yields a resource_drift entry and flips planned_values.etag to the live value:

resource_drift: github_membership.test  etag: "W/\"STALE…\"" -> "W/\"2cd97beb…\""

terraform plan (human output) reports “No changes” — the churn is invisible there and surfaces only in -json, exactly as described above. apply -refresh-only then persists the new etag.

2. This PR suppresses it. Same stale state, treatment build: 0 resource_drift entries and planned_values.etag unchanged → the planfile is byte-stable. apply -refresh-only keeps the prior etag.

3. The suppress does NOT disable conditional requests. With a fixed token the resource Read still returns 304 Not Modified and does not consume rate limit:

tf_resource_type=github_membership  tf_rpc=ReadResource
tf_http_res_status_code=304   X-Ratelimit-Used=6 (unchanged)

If-None-Match is built from d.Get("etag") independently of the schema-level suppress, so the rate-limit optimization stays intact for fixed-token users.

4. The stored etag still updates on real applies. A create→update of a resource moved the stored etag (299aaf4a…f0683293…) while the plan diff showed only the real field change. So the etag is refreshed whenever Terraform writes the resource (Create/Update end in a Read); only the refresh-phase drift is suppressed (DiffSuppressOnRefresh). The stored value stays correct after every apply.

5. github_app caveat. Under installation-token rotation the API returns a fresh etag on every read — I observed three distinct etags across three consecutive reads, never a 304. So for app-auth the conditional-request optimization is already non-functional regardless of this change; the in-state etag only ever produced churn, which this PR removes. (The flip side of @stevehipwell's “only works with a fixed token” note on #3103.)

Net: behavior matches the seven resources from #2840, fixed-token users keep the 304 rate-limit benefit, and app-auth users get the churn removed at no functional cost.


Prepared with LM assistance — Claude Code

@ei-grad ei-grad requested a review from deiga June 15, 2026 14:23
@stevehipwell

Copy link
Copy Markdown
Collaborator

@ei-grad I think we want etag suppression in all cases as we need to wait for a major version release to modify the API. Having had a quick look at your PR I think we want to add a named diff function (e.g. supressETag) to show clear intent here. I'm also unsure as to the value of the etag tests given that we're just testing the plugin API works as documented.

Long term I think the etag field needs removing from the schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug Something isn't working as documented

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: address etag drift in resources [MAINT]: Logic flaws with etag support

5 participants