[Compute] az vm cp: Add command for file transfer via storage bridge#32757
[Compute] az vm cp: Add command for file transfer via storage bridge#32757Ashutosh0x wants to merge 11 commits intoAzure:devfrom
az vm cp: Add command for file transfer via storage bridge#32757Conversation
Implements a client-side bridge solution using Azure Storage to facilitate file transfers to and from Azure VMs, particularly useful for private network environments. Leverages existing 'run-command' capabilities.
❌AzureCLI-FullTest
|
|
| rule | cmd_name | rule_message | suggest_message |
|---|---|---|---|
| vm cp | cmd vm cp added |
|
Thank you for your contribution! We will review the pull request and get back to you soon. |
|
The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR. Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions). pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>
|
There was a problem hiding this comment.
Pull request overview
This PR implements a new az vm cp command to facilitate file transfers to and from Azure Virtual Machines using an Azure Storage blob container as an intermediary bridge. The solution addresses issue #5275 by providing a native Azure CLI command for file transfers that works through the Azure management plane, making it compatible with VMs in private networks.
Changes:
- Added
vm_cpfunction implementing client-side bridge file transfer using storage blobs andaz vm run-command - Registered new
cpcommand in the VM command group - Added parameter definitions for source, destination, storage_account, and container_name
- Added help documentation with usage examples
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 22 comments.
| File | Description |
|---|---|
| src/azure-cli/azure/cli/command_modules/vm/custom.py | Core implementation of vm_cp function with path parsing, storage setup, and VM run-command execution |
| src/azure-cli/azure/cli/command_modules/vm/commands.py | Command registration for 'az vm cp' |
| src/azure-cli/azure/cli/command_modules/vm/_params.py | Parameter definitions for the new command |
| src/azure-cli/azure/cli/command_modules/vm/_help.py | Help documentation and usage examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| blob_service_client = cf_blob_service(cmd.cli_ctx, {'account_name': sa_name, 'account_key': account_key}) | ||
| container_client = blob_service_client.get_container_client(container_name) | ||
| try: | ||
| container_client.create_container() |
There was a problem hiding this comment.
The bare except clause catches all exceptions silently, which can hide important errors like authentication failures, network issues, or permission problems. This makes debugging difficult for users. Change this to catch specific exceptions like ResourceExistsError and let other exceptions propagate with helpful error messages.
| if is_linux: | ||
| script = "curl -L -o '{}' '{}'".format(vm_path, blob_url) | ||
| command_id = 'RunShellScript' | ||
| else: |
There was a problem hiding this comment.
Security vulnerability: The PowerShell script includes user-provided paths without proper escaping. The vm_path variable is directly interpolated into the Invoke-WebRequest command. If vm_path contains single quotes or other PowerShell special characters, it could enable command injection. Use proper PowerShell escaping before constructing the script.
| else: | |
| escaped_blob_url = blob_url.replace("'", "''") | |
| escaped_vm_path = vm_path.replace("'", "''") | |
| script = "Invoke-WebRequest -Uri '{}' -OutFile '{}'".format(escaped_blob_url, escaped_vm_path) |
| else: | ||
| # DOWNLOAD: VM -> Local | ||
| rg, vm_name, vm_path = source_vm | ||
| if not rg: |
There was a problem hiding this comment.
Inconsistent whitespace: There's a leading space before the comment on line 6674 that breaks the indentation pattern. This should be removed to maintain consistent code formatting.
| if not rg: | |
| # find VM RG |
| if is_linux: | ||
| script = "curl -X PUT -T '{}' -H 'x-ms-blob-type: BlockBlob' '{}'".format(vm_path, blob_url) | ||
| command_id = 'RunShellScript' | ||
| else: |
There was a problem hiding this comment.
Security vulnerability: The PowerShell script includes user-provided vm_path without proper escaping. If vm_path contains single quotes or other PowerShell special characters, it could enable command injection. Use proper PowerShell escaping before constructing the script.
| else: | |
| # Escape single quotes in vm_path for safe use inside a single-quoted PowerShell string | |
| escaped_vm_path = vm_path.replace("'", "''") if vm_path is not None else vm_path | |
| script = "$body = Get-Content -Path '{}' -Encoding Byte; Invoke-RestMethod -Uri '{}' -Method Put -Headers @{{'x-ms-blob-type'='BlockBlob'}} -Body $body".format(escaped_vm_path, blob_url) |
|
|
||
| logger.info("Executing download script in VM...") | ||
| from .aaz.latest.vm.run_command import Invoke | ||
| Invoke(cli_ctx=cmd.cli_ctx)(command_args={ | ||
| 'resource_group': rg, | ||
| 'vm_name': vm_name, | ||
| 'command_id': command_id, | ||
| 'script': [script] | ||
| }) | ||
|
|
||
| # Cleanup | ||
| logger.info("Cleaning up bridge storage...") |
There was a problem hiding this comment.
Operational reliability issue: If an error occurs during VM script execution (lines 6659-6664 or 6703-6708), the cleanup code at lines 6668 or 6715 will not execute, leaving orphaned blobs in the storage account. This will accumulate storage costs over time. Consider wrapping the operation in a try-finally block to ensure cleanup always occurs, or implement a timeout-based cleanup mechanism.
| short-summary: Copy files to and from a virtual machine. | ||
| long-summary: > | ||
| This command uses an Azure Storage blob container as an intermediary bridge to transfer files. | ||
| It requires 'az vm run-command' capability on the target VM. |
There was a problem hiding this comment.
Documentation issue: The long-summary should explain the requirements more clearly, specifically that curl must be available on Linux VMs and that the VM must be running and accessible via run-command. It should also mention that the command requires appropriate permissions on both the VM and the storage account.
| It requires 'az vm run-command' capability on the target VM. | |
| It requires that the target VM is running and accessible via 'az vm run-command'. On Linux VMs, 'curl' | |
| must be installed and available on the PATH. The identity used to run this command must have sufficient | |
| permissions both on the VM (to execute run-command) and on the storage account/container (to read and write blobs). |
|
|
||
|
|
||
| def vm_cp(cmd, source, destination, storage_account=None, container_name='azvmcp'): | ||
| source_vm = _parse_vm_file_path(source) | ||
| dest_vm = _parse_vm_file_path(destination) | ||
|
|
||
| if source_vm and dest_vm: | ||
| raise ValidationError("Both source and destination cannot be VM paths.") | ||
| if not source_vm and not dest_vm: | ||
| raise ValidationError("Either source or destination must be a VM path (format: [rg:]vm:path).") | ||
|
|
||
| # 1. Prepare Storage Account | ||
| if not storage_account: | ||
| # Try to find a storage account in the VM's resource group | ||
| rg = (source_vm[0] if source_vm else dest_vm[0]) | ||
| vm_name = (source_vm[1] if source_vm else dest_vm[1]) | ||
|
|
||
| if not rg: | ||
| # Get RG of the VM | ||
| client = _compute_client_factory(cmd.cli_ctx) | ||
| vms = client.virtual_machines.list_all() | ||
| vm = next((v for v in vms if v.name.lower() == vm_name.lower()), None) | ||
| if not vm: | ||
| raise ResourceNotFoundError("VM '{}' not found.".format(vm_name)) | ||
| # parse RG from ID | ||
| rg = vm.id.split('/')[4] | ||
|
|
||
| from azure.cli.command_modules.storage._client_factory import cf_sa | ||
| sa_client = cf_sa(cmd.cli_ctx, None) | ||
| accounts = list(sa_client.list()) | ||
| # Filter by RG if possible | ||
| rg_accounts = [a for a in accounts if a.id.split('/')[4].lower() == rg.lower()] | ||
| if rg_accounts: | ||
| storage_account = rg_accounts[0].name | ||
| elif accounts: | ||
| storage_account = accounts[0].name | ||
| else: | ||
| raise RequiredArgumentMissingError("No storage account found in the subscription. Please provide one with --storage-account.") | ||
|
|
||
| # Get account key | ||
| from azure.cli.command_modules.storage._client_factory import cf_sa_for_keys | ||
| sa_keys_client = cf_sa_for_keys(cmd.cli_ctx, None) | ||
| # Check if storage_account is name or ID | ||
| if '/' in storage_account: | ||
| sa_rg = storage_account.split('/')[4] | ||
| sa_name = storage_account.split('/')[-1] | ||
| else: | ||
| # Search for it | ||
| sa_client = cf_sa(cmd.cli_ctx, None) | ||
| accounts = list(sa_client.list()) | ||
| account = next((a for a in accounts if a.name.lower() == storage_account.lower()), None) | ||
| if not account: | ||
| raise ResourceNotFoundError("Storage account '{}' not found.".format(storage_account)) | ||
| sa_rg = account.id.split('/')[4] | ||
| sa_name = account.name | ||
|
|
||
| keys = sa_keys_client.list_keys(sa_rg, sa_name).keys | ||
| account_key = keys[0].value | ||
|
|
||
| # Ensure container exists | ||
| from azure.cli.command_modules.storage._client_factory import cf_blob_service | ||
| blob_service_client = cf_blob_service(cmd.cli_ctx, {'account_name': sa_name, 'account_key': account_key}) | ||
| container_client = blob_service_client.get_container_client(container_name) | ||
| try: | ||
| container_client.create_container() | ||
| except Exception: # Already exists or other error # pylint: disable=broad-except | ||
| pass | ||
|
|
||
| blob_name = str(uuid.uuid4()) | ||
|
|
||
| blob_client = container_client.get_blob_client(blob_name) | ||
|
|
||
| if dest_vm: | ||
| # UPLOAD: Local -> VM | ||
| rg, vm_name, vm_path = dest_vm | ||
| if not rg: | ||
| # find VM RG | ||
| client = _compute_client_factory(cmd.cli_ctx) | ||
| vms = client.virtual_machines.list_all() | ||
| vm = next((v for v in vms if v.name.lower() == vm_name.lower()), None) | ||
| rg = vm.id.split('/')[4] | ||
|
|
||
| logger.info("Uploading local file to bridge storage...") | ||
| upload_blob(cmd, blob_client, file_path=source) | ||
|
|
||
| # Get SAS for VM to download | ||
| sas_token = create_short_lived_blob_sas_v2(cmd, sa_name, container_name, blob_name, account_key=account_key) | ||
| blob_url = "https://{}.blob.{}/{}/{}?{}".format(sa_name, cmd.cli_ctx.cloud.suffixes.storage_endpoint, container_name, blob_name, sas_token) | ||
|
|
||
| # VM run-command to download | ||
| # Check OS type | ||
| vm_obj = _compute_client_factory(cmd.cli_ctx).virtual_machines.get(rg, vm_name) | ||
| is_linux = vm_obj.storage_profile.os_disk.os_type.lower() == 'linux' | ||
|
|
||
| if is_linux: | ||
| script = "curl -L -o '{}' '{}'".format(vm_path, blob_url) | ||
| command_id = 'RunShellScript' | ||
| else: | ||
| script = "Invoke-WebRequest -Uri '{}' -OutFile '{}'".format(blob_url, vm_path) | ||
| command_id = 'RunPowerShellScript' | ||
|
|
||
| logger.info("Executing download script in VM...") | ||
| from .aaz.latest.vm.run_command import Invoke | ||
| Invoke(cli_ctx=cmd.cli_ctx)(command_args={ | ||
| 'resource_group': rg, | ||
| 'vm_name': vm_name, | ||
| 'command_id': command_id, | ||
| 'script': [script] | ||
| }) | ||
|
|
||
| # Cleanup | ||
| logger.info("Cleaning up bridge storage...") | ||
| blob_client.delete_blob() | ||
|
|
||
| else: | ||
| # DOWNLOAD: VM -> Local | ||
| rg, vm_name, vm_path = source_vm | ||
| if not rg: | ||
| # find VM RG | ||
| client = _compute_client_factory(cmd.cli_ctx) | ||
| vms = client.virtual_machines.list_all() | ||
| vm = next((v for v in vms if v.name.lower() == vm_name.lower()), None) | ||
| rg = vm.id.split('/')[4] | ||
|
|
||
| # Get SAS with WRITE permission | ||
| t_sas = cmd.get_models('_shared_access_signature#BlobSharedAccessSignature', | ||
| resource_type=ResourceType.DATA_STORAGE_BLOB) | ||
| t_blob_permissions = cmd.get_models('_models#BlobSasPermissions', resource_type=ResourceType.DATA_STORAGE_BLOB) | ||
| expiry = (datetime.utcnow() + timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%SZ') | ||
| sas = t_sas(sa_name, account_key=account_key) | ||
| sas_token = sas.generate_blob(container_name, blob_name, | ||
| permission=t_blob_permissions(write=True), | ||
| expiry=expiry, protocol='https') | ||
| blob_url = "https://{}.blob.{}/{}/{}?{}".format(sa_name, cmd.cli_ctx.cloud.suffixes.storage_endpoint, container_name, blob_name, sas_token) | ||
|
|
||
| vm_obj = _compute_client_factory(cmd.cli_ctx).virtual_machines.get(rg, vm_name) | ||
| is_linux = vm_obj.storage_profile.os_disk.os_type.lower() == 'linux' | ||
|
|
||
| if is_linux: | ||
| script = "curl -X PUT -T '{}' -H 'x-ms-blob-type: BlockBlob' '{}'".format(vm_path, blob_url) | ||
| command_id = 'RunShellScript' | ||
| else: | ||
| script = "$body = Get-Content -Path '{}' -Encoding Byte; Invoke-RestMethod -Uri '{}' -Method Put -Headers @{{'x-ms-blob-type'='BlockBlob'}} -Body $body".format(vm_path, blob_url) | ||
| command_id = 'RunPowerShellScript' | ||
|
|
||
| logger.info("Executing upload script in VM...") | ||
| from .aaz.latest.vm.run_command import Invoke | ||
| Invoke(cli_ctx=cmd.cli_ctx)(command_args={ | ||
| 'resource_group': rg, | ||
| 'vm_name': vm_name, | ||
| 'command_id': command_id, | ||
| 'script': [script] | ||
| }) | ||
|
|
||
| logger.info("Downloading from bridge storage to local...") | ||
| download_blob(blob_client, file_path=destination) | ||
|
|
||
| # Cleanup | ||
| logger.info("Cleaning up bridge storage...") | ||
| blob_client.delete_blob() | ||
|
|
There was a problem hiding this comment.
Missing test coverage: There are no tests for the new vm_cp command. The vm module has extensive test files (test_vm_commands.py has 13,881 lines), indicating that comprehensive testing is a convention in this codebase. The vm_cp command involves complex logic with multiple error paths and should have unit tests covering various scenarios including: path parsing edge cases, VM/storage account discovery, upload/download flows, error handling, and cleanup behavior.
| if not rg: | ||
| # Get RG of the VM | ||
| client = _compute_client_factory(cmd.cli_ctx) | ||
| vms = client.virtual_machines.list_all() | ||
| vm = next((v for v in vms if v.name.lower() == vm_name.lower()), None) | ||
| if not vm: | ||
| raise ResourceNotFoundError("VM '{}' not found.".format(vm_name)) | ||
| # parse RG from ID | ||
| rg = vm.id.split('/')[4] | ||
|
|
||
| from azure.cli.command_modules.storage._client_factory import cf_sa | ||
| sa_client = cf_sa(cmd.cli_ctx, None) | ||
| accounts = list(sa_client.list()) | ||
| # Filter by RG if possible | ||
| rg_accounts = [a for a in accounts if a.id.split('/')[4].lower() == rg.lower()] | ||
| if rg_accounts: | ||
| storage_account = rg_accounts[0].name | ||
| elif accounts: | ||
| storage_account = accounts[0].name | ||
| else: | ||
| raise RequiredArgumentMissingError("No storage account found in the subscription. Please provide one with --storage-account.") | ||
|
|
||
| # Get account key | ||
| from azure.cli.command_modules.storage._client_factory import cf_sa_for_keys | ||
| sa_keys_client = cf_sa_for_keys(cmd.cli_ctx, None) | ||
| # Check if storage_account is name or ID | ||
| if '/' in storage_account: | ||
| sa_rg = storage_account.split('/')[4] | ||
| sa_name = storage_account.split('/')[-1] | ||
| else: | ||
| # Search for it | ||
| sa_client = cf_sa(cmd.cli_ctx, None) | ||
| accounts = list(sa_client.list()) | ||
| account = next((a for a in accounts if a.name.lower() == storage_account.lower()), None) | ||
| if not account: | ||
| raise ResourceNotFoundError("Storage account '{}' not found.".format(storage_account)) | ||
| sa_rg = account.id.split('/')[4] | ||
| sa_name = account.name | ||
|
|
||
| keys = sa_keys_client.list_keys(sa_rg, sa_name).keys | ||
| account_key = keys[0].value | ||
|
|
||
| # Ensure container exists | ||
| from azure.cli.command_modules.storage._client_factory import cf_blob_service | ||
| blob_service_client = cf_blob_service(cmd.cli_ctx, {'account_name': sa_name, 'account_key': account_key}) | ||
| container_client = blob_service_client.get_container_client(container_name) | ||
| try: | ||
| container_client.create_container() | ||
| except Exception: # Already exists or other error # pylint: disable=broad-except | ||
| pass | ||
|
|
||
| blob_name = str(uuid.uuid4()) | ||
|
|
||
| blob_client = container_client.get_blob_client(blob_name) | ||
|
|
||
| if dest_vm: | ||
| # UPLOAD: Local -> VM | ||
| rg, vm_name, vm_path = dest_vm | ||
| if not rg: | ||
| # find VM RG | ||
| client = _compute_client_factory(cmd.cli_ctx) | ||
| vms = client.virtual_machines.list_all() | ||
| vm = next((v for v in vms if v.name.lower() == vm_name.lower()), None) | ||
| rg = vm.id.split('/')[4] | ||
|
|
||
| logger.info("Uploading local file to bridge storage...") | ||
| upload_blob(cmd, blob_client, file_path=source) | ||
|
|
||
| # Get SAS for VM to download | ||
| sas_token = create_short_lived_blob_sas_v2(cmd, sa_name, container_name, blob_name, account_key=account_key) | ||
| blob_url = "https://{}.blob.{}/{}/{}?{}".format(sa_name, cmd.cli_ctx.cloud.suffixes.storage_endpoint, container_name, blob_name, sas_token) | ||
|
|
||
| # VM run-command to download | ||
| # Check OS type | ||
| vm_obj = _compute_client_factory(cmd.cli_ctx).virtual_machines.get(rg, vm_name) | ||
| is_linux = vm_obj.storage_profile.os_disk.os_type.lower() == 'linux' | ||
|
|
||
| if is_linux: | ||
| script = "curl -L -o '{}' '{}'".format(vm_path, blob_url) | ||
| command_id = 'RunShellScript' | ||
| else: | ||
| script = "Invoke-WebRequest -Uri '{}' -OutFile '{}'".format(blob_url, vm_path) | ||
| command_id = 'RunPowerShellScript' | ||
|
|
||
| logger.info("Executing download script in VM...") | ||
| from .aaz.latest.vm.run_command import Invoke | ||
| Invoke(cli_ctx=cmd.cli_ctx)(command_args={ | ||
| 'resource_group': rg, | ||
| 'vm_name': vm_name, | ||
| 'command_id': command_id, | ||
| 'script': [script] | ||
| }) | ||
|
|
||
| # Cleanup | ||
| logger.info("Cleaning up bridge storage...") | ||
| blob_client.delete_blob() | ||
|
|
||
| else: | ||
| # DOWNLOAD: VM -> Local | ||
| rg, vm_name, vm_path = source_vm | ||
| if not rg: | ||
| # find VM RG | ||
| client = _compute_client_factory(cmd.cli_ctx) | ||
| vms = client.virtual_machines.list_all() | ||
| vm = next((v for v in vms if v.name.lower() == vm_name.lower()), None) |
There was a problem hiding this comment.
Code duplication: The VM lookup logic (lines 6574-6580, 6633-6636, 6675-6678) is repeated three times with nearly identical code. This violates the DRY principle and makes maintenance harder. Extract this logic into a helper function that takes the VM name and optional resource group, and returns the VM object and resource group.
| upload_blob(cmd, blob_client, file_path=source) | ||
|
|
||
| # Get SAS for VM to download | ||
| sas_token = create_short_lived_blob_sas_v2(cmd, sa_name, container_name, blob_name, account_key=account_key) |
There was a problem hiding this comment.
Potential bug with cloud endpoints: The blob URL construction assumes the storage_endpoint suffix will not have a leading slash, but it's concatenated directly. If cmd.cli_ctx.cloud.suffixes.storage_endpoint includes unexpected formatting or is None, this will fail. Add validation or use a more robust URL construction method to handle different cloud environments correctly.
| sas_token = create_short_lived_blob_sas_v2(cmd, sa_name, container_name, blob_name, account_key=account_key) | |
| storage_suffix = getattr(cmd.cli_ctx.cloud.suffixes, 'storage_endpoint', None) | |
| if not storage_suffix: | |
| raise CLIError("The storage endpoint suffix for the current cloud is not configured.") | |
| # Normalize to avoid leading dots or slashes that would break the URL host | |
| storage_suffix = str(storage_suffix).lstrip("./").strip() | |
| blob_url = "https://{}.blob.{}/{}/{}?{}".format(sa_name, storage_suffix, container_name, blob_name, sas_token) |
| resource_type=ResourceType.DATA_STORAGE_BLOB) | ||
| t_blob_permissions = cmd.get_models('_models#BlobSasPermissions', resource_type=ResourceType.DATA_STORAGE_BLOB) | ||
| expiry = (datetime.utcnow() + timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%SZ') | ||
| sas = t_sas(sa_name, account_key=account_key) |
There was a problem hiding this comment.
Security concern: The SAS token expiry time is set to 1 day (lines 6686, and via create_short_lived_blob_sas_v2 at line 6642), which is excessive for a file transfer operation that typically completes in minutes. If the token is intercepted, it provides 24 hours of access to the blob. Reduce the expiry to a shorter duration (e.g., 1-2 hours) to minimize the security risk window while still providing enough time for the transfer to complete.
| sas = t_sas(sa_name, account_key=account_key) | |
| expiry = (datetime.utcnow() + timedelta(hours=2)).strftime('%Y-%m-%dT%H:%M:%SZ') |
|
Hi @zhoxing-ms @evelyn-ys @yanzhudd @yonzhan, I've just pushed an update addressing the feedback regarding security and reliability. Highlights of the latest changes:
Looking forward to your thoughts! |
|
Please fix CI issues |
|
Hi @yonzhan, I've addressed the CI failure (duplicate string formatting) and incorporated the Copilot feedback:
The PR should be ready for review now. Thank you! |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
az vm cp: Add command for file transfer via storage bridge
|
please fix the CI issues |
Refinement and Fixes for
|
Final CI Fixes and Refinements for
|
Description
This PR implements a new 'az vm cp' command to facilitate file transfers to and from Azure Virtual Machines.
Problem
Azure CLI currently lacks a native, easy-to-use command for copying files to/from VMs, especially in private networks where direct SSH/SCP or RDP access is not available. This has been a long-standing feature request (Issue #5275).
Solution
The proposed 'az vm cp' command implements a 'client-side bridge' solution:
Examples
This approach works through the Azure management plane, making it highly compatible with VMs behind firewalls or in private virtual networks.
Fixes #5275