Skip to content

Support --dry-run in add command#1201

Draft
sergiimk wants to merge 1 commit intomasterfrom
feature/apply-command
Draft

Support --dry-run in add command#1201
sergiimk wants to merge 1 commit intomasterfrom
feature/apply-command

Conversation

@sergiimk
Copy link
Copy Markdown
Member

@sergiimk sergiimk commented Apr 16, 2025

Related to: #900

This branch contains prep work for kamu apply command that would diff the current and desired state of datasets and apply necessary changes.

It's an important step to make large pipelines easier to maintain and a part of IaC efforts.

I approached this problem from envisioning the --dry-run flag as the crux of the problem - a flag that allows you to preview changes.

I decided to implement this flag via complete separation of planning and execution stages in use cases.

To prototype this, I started with adding --dry-run flag for kamu add command:

  • I moved the ability to add multiple snapshots at once from AddCommand into CreateDatasetFromSnapshotUseCase - thus moving the complex dependency-based sorting into the use case
  • I added separate prepare() and apply() methods to the use case
  • prepare() returns a complete plan of what will be done - you can see generated IDs, keys, content and hashes of every metadata block that will be added to new datasets etc
  • kamu add --dry-run simply dumps this plan as YAML into output

I like this approach, but still have some doubts:

While I really like having a detailed plan outputted for --dry-run - it shows what "will be done", not "what is different", so we may need a separate kamu diff or kamu apply --diff command to show the differences between current and desired states (e.g. diff between readmes, or SQL queries, or schemas)

Kubernetes API essentially operates on state and diffs - you apply the manifest as the target state of the resource. So their --dry-run will show only whether some resource is created or updated.

The way I implemented --dry-run here essentially shows a state transition plan that will be done within one transaction ... which is much more powerful. But because of Kubernetes async operators model - k8s never knows upfront how operators will act on diffs and can't plan them ahead - it may take multiple operators many steps and a long time to reconcile the current and desired states.

So I wonder if we will run into issues with --dry-run for more complex state transitions.

An alternative approach could be:

  • We focus on state diffing part for nice UX
  • We implement --dry-run simply as a rollback of transaction - i.e. execution prints what it usually prints, but progress is un-done at the very end.

@sergiimk sergiimk force-pushed the feature/apply-command branch from a6b8510 to 594b759 Compare April 17, 2025 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant