Skip to content

bundle/direct: record resource operations with DMS#5387

Open
shreyas-goenka wants to merge 5 commits into
mainfrom
shreyas-goenka/dms-version
Open

bundle/direct: record resource operations with DMS#5387
shreyas-goenka wants to merge 5 commits into
mainfrom
shreyas-goenka/dms-version

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka commented May 31, 2026

Summary

When experimental.record_deployment_history is set, each successfully applied resource operation in the direct engine is reported to the deployment metadata service (DMS) via CreateOperation, in addition to the local state file.

What changed

bundle/direct/oprecorder.goopRecorder interface + sdkRecorder:

type opRecorder interface {
	record(ctx context.Context, resourceKey string, action deployplan.ActionType, resourceID string, state any) error
}
  • Maps a deployplan action to its SDK OperationActionType. Non-mutating actions (Skip/Undefined) are rejected with an error rather than silently coerced.
  • The serialized config is carried in Operation.State so DMS can serve it as resource state; it is omitted for deletes (per the SDK contract).
  • Posts under the deployment version: deployments/{deployment_id}/versions/{version}.

bundle/direct/pkg.goDeploymentBundle.OpRec opRecorder, nil unless recording is enabled.

bundle/direct/bundle_apply.go — records after each successful create/update/recreate/resize/update_id/delete. Delete reads the resource ID before Destroy removes it from state. Migration mode records nothing.

bundle/phases/deploy.go — constructs the recorder in deployCore when b.Config.Experimental.RecordDeploymentHistory.

libs/testserver — handler for POST /api/2.0/bundle/{path...}/operations so recorded calls succeed under the mock server.

Acceptance tests

acceptance/bundle/deploy/record-deployment-history/ (direct engine, RecordRequests):

  • jobs/: create → update → delete; asserts action_type CREATE/UPDATE/DELETE, that serialized state is carried for create/update and omitted for delete, plus resource_key/status.
  • recreate/: changing a pipeline's immutable storage records RECREATE.

Design notes

  • Failure semantics: a failed CreateOperation fails the deploy, matching how local SaveState failures are treated — a state-backend write must not be silently lost.
  • Deployment version identity: the deployment ID and version come from the CreateVersion flow, which lands separately. Until then they are placeholders; record_deployment_history is experimental and off by default.

Test plan

  • go test ./bundle/direct/... ./bundle/phases/... ./libs/testserver/... green
  • go test ./acceptance -run TestAccept/bundle/deploy/record-deployment-history green
  • go vet + ./task lint-q clean
  • CI

@shreyas-goenka shreyas-goenka changed the title bundle/direct: record resource operations when DMS is enabled bundle/direct: call DMS CreateOperation API on each resource operation May 31, 2026
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented May 31, 2026

Commit: 465bc4a

Run: 26764845978

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/dms-version branch from 2fb2f59 to 2f36b2d Compare May 31, 2026 23:34
@shreyas-goenka shreyas-goenka changed the title bundle/direct: call DMS CreateOperation API on each resource operation bundle/direct: record resource operations with DMS May 31, 2026
@shreyas-goenka shreyas-goenka changed the base branch from shreyas-goenka/bundle-lock-abstraction to shreyas-goenka/cli-managed-state-flag May 31, 2026 23:34
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/dms-version branch from 2f36b2d to d5f35f3 Compare June 1, 2026 15:28
@shreyas-goenka shreyas-goenka changed the base branch from shreyas-goenka/cli-managed-state-flag to main June 1, 2026 15:28
When experimental.record_deployment_history is set, each successfully
applied resource operation is reported to the deployment metadata service
(DMS) via the CreateOperation API, in addition to the local state file.

- opRecorder interface + sdkRecorder: record(action, id, state) maps a
  deployplan action to its SDK operation type and posts it under the
  deployment version. The serialized config is carried in Operation.State
  so DMS can serve it as resource state; it is omitted for deletes.
- deployActionToSDK rejects non-mutating actions (Skip/Undefined) rather
  than coercing them, so an unmapped action surfaces as an error.
- DeploymentBundle.OpRec is nil unless recording is enabled; the apply
  loop records after each create/update/recreate/resize/update_id/delete.
- A failed recording fails the deploy, matching how local state writes
  (SaveState) are treated: a state-backend write must not be silently lost.

The deployment ID and version that identify the DMS version are sourced
from the CreateVersion flow, which lands separately; until then they are
placeholders (record_deployment_history is experimental and off by default).

Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>

// Deployment metadata service (DMS): record resource operations. The parent
// is "deployments/{id}/versions/{n}", so the path tail is captured wholesale.
server.Handle("POST", "/api/2.0/bundle/{path...}", func(req Request) any {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll implement this proper in a followup PR when we test the full read - write combined flows for DMS.

// Record the operation with DMS. Migration only mirrors existing
// state into the local store; there is no operation to report.
if !migrateMode {
if err := b.recordOperation(ctx, resourceKey, action, b.StateDB.GetResourceID(resourceKey), sv.Value); err != nil {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denik I know we discussed using an async queue here - but lets leave that as an optional followup since it's purely perf and not correctness.

…create

Local acceptance tests (direct engine, RecordRequests) that deploy with
experimental.record_deployment_history enabled and assert the recorded
CreateOperation requests:

- jobs/: create -> update -> delete (resource removed from config). Verifies
  action_type CREATE/UPDATE/DELETE, that the serialized state is carried for
  create/update and omitted for delete, and resource_key/status.
- recreate/: changing a pipeline's immutable 'storage' field records RECREATE.

Adds a testserver handler for POST /api/2.0/bundle/{path...}/operations so the
recorder's CreateOperation calls succeed under the mock server.

Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/dms-version branch from 4a6d5fa to e8b9646 Compare June 1, 2026 17:16
Renaming a catalog changes its ID, exercising the UpdateWithID path. The
recorded operation asserts action_type OPERATION_ACTION_TYPE_UPDATE_WITH_ID
and that resource_id reflects the new ID after the rename.

Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
Comment thread bundle/phases/deploy.go Outdated
if b.Config.Experimental != nil && b.Config.Experimental.RecordDeploymentHistory {
// TODO(DMS): source the deployment ID and version from the DMS
// CreateVersion response once the deployment-version flow is wired in.
b.DeploymentBundle.OpRec = direct.NewOperationRecorder(b.WorkspaceClient(ctx).Bundle, "", 0)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be a followup once the read path for DMS state is implemented.

…recording

Use "abcd" instead of "" so the recorded parent is the well-formed
deployments/abcd/versions/0 rather than a collapsed double-slash path.
Still a placeholder pending the DMS CreateVersion flow.

Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
Changing num_workers on a running cluster (lifecycle.started) triggers the
Resize path. Asserts action_type OPERATION_ACTION_TYPE_RESIZE. Completes
coverage of all mutating actions: create, update, update_id, recreate,
resize, delete.

Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
@shreyas-goenka shreyas-goenka requested a review from denik June 1, 2026 17:24
@shreyas-goenka shreyas-goenka marked this pull request as ready for review June 1, 2026 17:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Approval status: pending

/acceptance/bundle/ - needs approval

22 files changed
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @andrewnester, @lennartkats-db, @anton-107

/bundle/ - needs approval

5 files changed
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @andrewnester, @lennartkats-db, @anton-107

General files (require maintainer)

Files: libs/testserver/handlers.go
Based on git history:

  • @denik -- recent work in bundle/direct/, bundle/phases/, libs/testserver/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants