Description
The databricks sync command's --include flag does not work as expected for selective file inclusion. Instead of uploading only the specified files, it either uploads all files or attempts to delete existing workspace files.
Expected Behavior
When using --include flags, only the explicitly included files should be synced to the workspace:
databricks sync . "$PROJECT_PATH" \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py"
Expected: Only these 3 files should be uploaded, preserving the directory structure.
Actual Behavior
Three scenarios tested with --dry-run:
Scenario 1: --include alone
databricks sync . "$PROJECT_PATH" --dry-run \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py"
Result: Initial Sync Complete - All files in the directory are uploaded (include patterns ignored)
Scenario 2: --exclude "*" before --include
databricks sync . "$PROJECT_PATH" --dry-run \
--exclude "*" \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py"
Result: Action: DELETE - Attempts to DELETE all existing files from workspace
Scenario 3: --include before --exclude "**/*"
databricks sync . "$PROJECT_PATH" --dry-run \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py" \
--exclude "**/*"
Result: Action: DELETE - Attempts to DELETE all files from workspace (same as Scenario 2)
Steps to Reproduce
- Create a directory with multiple files
- Run:
databricks sync . /Workspace/Users/<user>/test --dry-run --include "specific/file.py"
- Observe that all files are marked for upload, not just the included one
Environment
- Databricks CLI version: (tested on latest)
- OS: macOS (darwin 24.6.0)
- Shell: bash
Workaround
Currently using a temporary directory approach:
TEMP_DIR=$(mktemp -d)
mkdir -p "$TEMP_DIR/libs"
cp libs/__init__.py libs/source_loader.py "$TEMP_DIR/libs/"
databricks sync "$TEMP_DIR" "$PROJECT_PATH" --full
rm -rf "$TEMP_DIR"
Related
This might be related to #4137 (Feature suggestion - rsync equivalent), which suggests that databricks sync could benefit from rsync-style selective file inclusion.
Suggested Fix
Either:
- Fix
--include to work as a positive allowlist (only sync these files)
- Document the current behavior more clearly in
databricks sync --help
- Add a new flag like
--only-include that syncs only the specified files
Description
The
databricks synccommand's--includeflag does not work as expected for selective file inclusion. Instead of uploading only the specified files, it either uploads all files or attempts to delete existing workspace files.Expected Behavior
When using
--includeflags, only the explicitly included files should be synced to the workspace:Expected: Only these 3 files should be uploaded, preserving the directory structure.
Actual Behavior
Three scenarios tested with
--dry-run:Scenario 1:
--includealoneResult:
Initial Sync Complete- All files in the directory are uploaded (include patterns ignored)Scenario 2:
--exclude "*"before--includeResult:
Action: DELETE- Attempts to DELETE all existing files from workspaceScenario 3:
--includebefore--exclude "**/*"Result:
Action: DELETE- Attempts to DELETE all files from workspace (same as Scenario 2)Steps to Reproduce
databricks sync . /Workspace/Users/<user>/test --dry-run --include "specific/file.py"Environment
Workaround
Currently using a temporary directory approach:
Related
This might be related to #4137 (Feature suggestion - rsync equivalent), which suggests that
databricks synccould benefit from rsync-style selective file inclusion.Suggested Fix
Either:
--includeto work as a positive allowlist (only sync these files)databricks sync --help--only-includethat syncs only the specified files