feat(azure_blob sink): Expand support for Azure authentication types#24729
feat(azure_blob sink): Expand support for Azure authentication types#24729jlaundry wants to merge 7 commits intovectordotdev:masterfrom
Conversation
Signed-off-by: Jed Laundry <jlaundry@jlaundry.com>
Signed-off-by: Jed Laundry <jlaundry@jlaundry.com>
Signed-off-by: Jed Laundry <jlaundry@jlaundry.com>
Signed-off-by: Jed Laundry <jlaundry@jlaundry.com>
Signed-off-by: Jed Laundry <jlaundry@jlaundry.com>
| let config = AzureBlobSinkConfig { | ||
| auth: Some(client_secret_credential), | ||
| tls: Some(TlsConfig { | ||
| ca_file: Some(tls::TEST_PEM_CA_PATH.into()), |
There was a problem hiding this comment.
This doesn't seem to be working, and the test times out with:
test sinks::azure_blob::integration_tests::azure_blob_insert_lines_into_blob_with_oauth has been running for over 60 seconds
thread 'sinks::azure_blob::integration_tests::azure_blob_insert_lines_into_blob_with_oauth' (2334) panicked at src/sinks/azure_blob/integration_tests.rs:423:18:
Failed to create container: Error { context: CustomMessage(Custom { kind: Io, error: Error { context: CustomMessage(Custom { kind: Io, error: reqwest::Error { kind: Request, url: "https://azurite:14430/devstoreaccount1/logs?restype=container", source: hyper_util::client::legacy::Error(Connect, Ssl(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 167772294, library: "SSL routines", function: "tls_post_process_server_certificate", reason: "certificate verify failed", file: "ssl/statem/statem_clnt.c", line: 2124 }]))) }, X509VerifyResult { code: 19, error: "self-signed certificate in certificate chain" })) } }, "failed to execute `reqwest` request") } }, "retry policy expired and the request will no longer be retried") }
But, if I podman exec into the test container, it's definitely picked up the correct server certificate:
root@runner:/home/vector# curl -vv --cacert tests/data/ca/certs/ca.cert.pem https://azurite:14430/devstoreaccount1/logs?restype=container
07:38:38.369906 [0-0] * Host azurite:14430 was resolved.
07:38:38.370096 [0-0] * IPv6: (none)
07:38:38.370188 [0-0] * IPv4: 10.89.1.17
07:38:38.370284 [0-0] * [HTTPS-CONNECT] adding wanted h2
07:38:38.370377 [0-0] * [HTTPS-CONNECT] added
07:38:38.370474 [0-0] * [HTTPS-CONNECT] connect, init
07:38:38.370632 [0-0] * Trying 10.89.1.17:14430...
07:38:38.370885 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
07:38:38.370979 [0-0] * [HTTPS-CONNECT] Curl_conn_connect(block=0) -> 0, done=0
07:38:38.371079 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
07:38:38.376398 [0-0] * ALPN: curl offers h2,http/1.1
07:38:38.378054 [0-0] * TLSv1.3 (OUT), TLS handshake, Client hello (1):
07:38:38.380133 [0-0] * CAfile: tests/data/ca/certs/ca.cert.pem
07:38:38.380624 [0-0] * CApath: /etc/ssl/certs
07:38:38.381294 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
07:38:38.382070 [0-0] * [HTTPS-CONNECT] Curl_conn_connect(block=0) -> 0, done=0
07:38:38.382212 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
07:38:38.382373 [0-0] * TLSv1.3 (IN), TLS handshake, Server hello (2):
07:38:38.382797 [0-0] * TLSv1.3 (IN), TLS change cipher, Change cipher spec (1):
07:38:38.382955 [0-0] * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
07:38:38.383120 [0-0] * TLSv1.3 (IN), TLS handshake, Certificate (11):
07:38:38.383799 [0-0] * TLSv1.3 (IN), TLS handshake, CERT verify (15):
07:38:38.384018 [0-0] * TLSv1.3 (IN), TLS handshake, Finished (20):
07:38:38.384205 [0-0] * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
07:38:38.384375 [0-0] * TLSv1.3 (OUT), TLS handshake, Finished (20):
07:38:38.384575 [0-0] * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / RSASSA-PSS
07:38:38.385706 [0-0] * ALPN: server accepted http/1.1
07:38:38.385847 [0-0] * Server certificate:
07:38:38.385995 [0-0] * subject: C=US; ST=New York; L=New York; O=Datadog; OU=Vector; CN=azurite
07:38:38.386131 [0-0] * start date: Feb 24 21:46:40 2026 GMT
07:38:38.386267 [0-0] * expire date: Feb 22 21:46:40 2036 GMT
07:38:38.386407 [0-0] * common name: azurite (matched)
07:38:38.386542 [0-0] * issuer: C=US; ST=New York; O=Datadog; OU=Vector; CN=Vector Intermediate Server CA
07:38:38.386700 [0-0] * SSL certificate verify ok.
07:38:38.386856 [0-0] * Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
07:38:38.387008 [0-0] * Certificate level 1: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
07:38:38.387150 [0-0] * Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
07:38:38.387300 [0-0] * [HTTPS-CONNECT] connect+handshake h2: 16ms, 1st data: 11ms
07:38:38.387440 [0-0] * [HTTPS-CONNECT] connect -> 0, done=1
07:38:38.387592 [0-0] * [HTTPS-CONNECT] Curl_conn_connect(block=0) -> 0, done=1
07:38:38.387838 [0-0] * Connected to azurite (10.89.1.17) port 14430
07:38:38.388227 [0-0] * using HTTP/1.x
07:38:38.388821 [0-0] > GET /devstoreaccount1/logs?restype=container HTTP/1.1
07:38:38.388821 [0-0] > Host: azurite:14430
07:38:38.388821 [0-0] > User-Agent: curl/8.14.1
07:38:38.388821 [0-0] > Accept: */*
07:38:38.388821 [0-0] >
07:38:38.391799 [0-0] * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
07:38:38.391887 [0-0] * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
07:38:38.391942 [0-0] * Request completely sent off
07:38:38.416250 [0-0] < HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of the Authorization header is formed correctly including the signature.
07:38:38.416323 [0-0] < Server: Azurite-Blob/3.35.0
07:38:38.416380 [0-0] < x-ms-error-code: AuthorizationFailure
07:38:38.416436 [0-0] < x-ms-request-id: f8d0b824-0e36-4823-a730-c9c28fdb27a4
07:38:38.416493 [0-0] < content-type: application/xml
07:38:38.416549 [0-0] < Date: Wed, 25 Feb 2026 07:38:38 GMT
07:38:38.416624 [0-0] < Connection: keep-alive
07:38:38.416679 [0-0] < Keep-Alive: timeout=5
07:38:38.416743 [0-0] < Transfer-Encoding: chunked
07:38:38.416819 [0-0] <
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
<Code>AuthorizationFailure</Code>
<Message>Server failed to authenticate the request. Make sure the value of the Authorization header is formed correctly including the signature.
RequestId:f8d0b824-0e36-4823-a730-c9c28fdb27a4
Time:2026-02-25T07:38:38.413Z</Message>
</Error>
There was a problem hiding this comment.
Is it possible you have a mismatch between the common name in the certificate and the name of the endpoint?
Looking at your CSR, it appears that the common name in the certificate request is "azurite", not "localhost"
Signed-off-by: Jed Laundry <jlaundry@jlaundry.com>
| #[configurable(metadata( | ||
| docs::examples = "DefaultEndpointsProtocol=https;AccountName=mylogstorage;AccountKey=storageaccountkeybase64encoded;EndpointSuffix=core.windows.net" | ||
| ))] | ||
| #[configurable(metadata( |
There was a problem hiding this comment.
I would consider making the connection_string type as optional given the auth field - otherwise customers have to have a connection_string even if they're using auth.
If customers do use auth, then the only value in the connection_string is the account name. IMHO it would be better to split the account name field out as a separate optional field, because fundamentally the account name is not a sensitive construct.
And then of course have checks to make sure that auth and connection_string are not both provided, that one of auth or connection_string is provided, if connection_string is provided, then storage_account is not provided, and if auth is provided, storage_account is provided.
Or this could possibly be handled with a more complicated structure, something like (just coding off the top of my head):
pub auth: enum Authentication {
ConnectionString(SensitiveString),
TokenIdentity{
storage_account: String,
auth: AzureAuthentication,
}
}What I'm not sure about is if it's possible to do this without introducing a breaking change related to the loss of the connection_string mandatory field.
| #[configurable(metadata( | ||
| docs::examples = "BlobEndpoint=https://mylogstorage.blob.core.windows.net/;SharedAccessSignature=generatedsastoken" | ||
| ))] | ||
| pub connection_string: SensitiveString, |
There was a problem hiding this comment.
I might consider leaving a note about the use of connection strings - they can be extremely challenging for customers to correctly manage, especially when they are memorialized in configuration files.
Something similar to:
/// ** SECURITY NOTE **
/// Connection strings contain sensitive information, such as access keys or SAS tokens, that can be used to gain unauthorized access to your Azure Blob Storage resources.
/// It is important to keep connection strings secure and not expose them in logs, error messages, or version control systems.
///
/// Numerous security breaches have occurred due to leaked connection strings,
/// so please take care to manage them securely. Consider using secret management tools to store and manage connection strings securely.
Feel free to tone down the language, but it's important that the risks associated with putting credentials in text files be explained to customers.
| #[derivative(Default)] | ||
| #[serde(deny_unknown_fields, untagged)] | ||
| pub enum AzureAuthentication { | ||
| /// Use client credentials |
There was a problem hiding this comment.
FWIW, the Azure SDK team is working on defining a schema for expressing client credentials, I might suggest something like:
// The type of managed identity to use for authentication when using Managed Identity Credential authentication.
#[configurable_component]
#[derive(Clone, Debug, PartialEq)]
pub enum ManagedIdentityType {
/// System Assigned Managed Identity
///
/// Enabled directly on an Azure resource and cannot be shared across resources.
SystemAssigned,
/// User Assigned Managed Identity
///
/// A standalone Azure resource that can be assigned to one or more Azure resources.
ClientId,
/// User Assigned Managed Identity identified by Resource ID
///
/// A standalone Azure resource that can be assigned to one or
/// more Azure resources.
ResourceId,
/// User Assigned Managed Identity identified by Object ID
///
/// A standalone Azure resource that can be assigned to one or
/// more Azure resources.
ObjectId,
} /// Use Developer Tools Credential for authentication.
///
/// This method is typically used for local development and testing,
/// allowing developers to authenticate using their Azure developer
/// tools credentials.
#[configurable]
DeveloperToolsCredential,
/// Use Azure CLI Credential for authentication.
///
/// This method allows authentication using the Azure CLI credentials,
/// which are commonly used for local development and testing.
#[configurable]
AzureCliCredential {
/// The tenant ID to use for authentication.
///
/// This is required when using Azure CLI Credential authentication.
#[configurable(metadata(docs::examples = "00000000-0000-0000-0000-000000000000"))]
tenant_id: String,
/// The subscription ID to use for authentication.
///
/// This is required when using Azure CLI Credential authentication.
#[configurable(metadata(docs::examples = "00000000-0000-0000-0000-000000000000"))]
subscription: String,
// /// Timeout for the Azure CLI process to complete.
// ///
// /// If the process does not complete within this duration,
// /// authentication will fail. This is optional and
// /// defaults to 60 seconds if not specified.
// process_timeout: Option<std::time::Duration>,
// /// Additionally allowed tenant IDs for authentication.
// ///
// /// This is optional and can be used to specify additional
// /// tenant IDs that are allowed for authentication when
// /// using Azure CLI Credential authentication.
// additionally_allowed_tenants: Option<Vec<String>>,
},
/// Use Azure Developer CLI Credential for authentication.
///
/// This method allows authentication using the Azure Developer CLI
/// credentials, which are commonly used for local development and testing.
#[configurable]
AzureDeveloperCliCredential {
/// Identifies the tenant the credential should authenticate in.
///
/// Defaults to the azd environment, which is the tenant of the selected Azure subscription.
#[configurable(metadata(docs::examples = "00000000-0000-0000-0000-000000000000"))]
tenant_id: Option<String>,
},
/// Authenticates an Azure Pipelines Service Connection.
#[configurable]
AzurePipelinesCredential {
/// The ID of the Azure Pipelines Service Connection to authenticate.
#[configurable(metadata(docs::examples = "00000000-0000-0000-0000-000000000000"))]
service_connection_id: String,
/// The tenant ID associated with the Azure Pipelines Service Connection.
#[configurable(metadata(docs::examples = "00000000-0000-0000-0000-000000000000"))]
tenant_id: String,
/// The subscription ID associated with the Azure Pipelines Service Connection.
#[configurable(metadata(docs::examples = "00000000-0000-0000-0000-000000000000"))]
subscription_id: String,
/// System Access Token
///
/// Security token for the running build. See
/// [Azure Pipelines documentation](https://learn.microsoft.com/azure/devops/pipelines/build/variables?view=azure-devops#systemaccesstoken)
/// for an example showing how to get this value.
system_access_token: SensitiveString,
},
/// Authenticates an Entra Workload Identity on Kubernetes.
#[configurable]
WorkloadIdentityCredential {
/// The tenant ID associated with the Entra Workload Identity.
tenant_id: String,
/// The client ID associated with the Entra Workload Identity.
client_id: String,
/// The subscription ID associated with the Entra Workload Identity.
subscription_id: String,
/// System Access Token
///
/// Security token for the running build. See
/// [Azure Pipelines documentation](https://learn.microsoft.com/azure/devops/pipelines/build/variables?view=azure-devops#systemaccesstoken)
/// for an example showing how to get this value.
system_access_token: SensitiveString,
},
/// Use Managed Identity Credential for authentication.
///
/// This method allows authentication using Azure Managed Identities,
/// which is commonly used for applications running in Azure environments.
#[configurable]
ManagedIdentityCredential {
/// The type of managed identity to use for authentication.
managed_identity_type: ManagedIdentityType,
/// The id of the user assigned managed identity to use for authentication.
managed_identity_id: Option<String>,
},
/// Use Client Assertion Credential for authentication.
#[configurable]
ClientAssertionCredential {
/// The tenant ID associated with the Entra Workload Identity.
tenant_id: String,
/// The client ID associated with the Entra Workload Identity.
client_id: String,
/// The subscription ID associated with the Entra Workload Identity.
subscription_id: String,
},
/// Use Client Certificate Credential to authenticate an application
/// with a certificate.
#[configurable]
ClientCertificateCredential {
/// The tenant ID associated with the Entra Workload Identity.
tenant_id: String,
/// The client ID associated with the Entra Workload Identity.
client_id: String,
/// Base64 encoded PKCS12 certificate with its RSA private key.
certificate: SensitiveString,
/// The password for the client certificate, if applicable.
certificate_password: Option<SensitiveString>,
},
/// Use Client Secret Credential for authentication.
///
/// This method allows authentication using a client secret, which is a string value
/// that serves as a password for the application.
#[configurable]
ClientSecretCredential {
/// The tenant ID associated with the Entra Workload Identity.
tenant_id: String,
/// The client ID associated with the Entra Workload Identity.
client_id: String,
/// The client secret value to authenticate with.
client_secret: SensitiveString,
},
// The following credential types are currently not supported by the Azure SDK for Rust, but may be added in the future:
//
// AzurePowerShellCredential,
// EnvironmentCredential,
// InteractiveBrowserCredential,
// VisualStudioCredential,
// VisualStudioCodeCredential,
// BrokerCredential,
// /// Use an API Key for authentication.
// ///
// /// Note: Do NOT put API keys in appsettings.json.
// /// Use environment variables or Key Vault secrets instead. See https://aka.ms/azsdk/config/secrets
// ApiKeyCredential(SensitiveString),| pub enum AzureAuthentication { | ||
| /// Use client credentials | ||
| #[derivative(Default)] | ||
| ClientSecretCredential { |
There was a problem hiding this comment.
I'll be honest and say that I'm not 100% sure I'm happy with the idea of promoting ClientSecretCredential as the default credential type, given the myriad of mechanisms.
Talking to the identity architect for the Azure SDK, he (and I) would prefer that there be no default credential type.
If you have to have a default, for customers running in Azure, we would recommend using ManagedIdentityCredential with one of the user-assigned variants (ClientId, ResourceId, or ObjectId) as the default rather than ClientSecretCredential.
And yes, I know that this affects the azure_logs_ingestion sink as well.
There was a problem hiding this comment.
I'll be honest and say that I'm not 100% sure I'm happy with the idea of promoting
ClientSecretCredentialas the default credential type, given the myriad of mechanisms.Talking to the identity architect for the Azure SDK, he (and I) would prefer that there be no default credential type.
If you have to have a default, for customers running in Azure, we would recommend using
ManagedIdentityCredentialwith one of the user-assigned variants (ClientId,ResourceId, orObjectId) as the default rather thanClientSecretCredential.And yes, I know that this affects the
azure_logs_ingestionsink as well.
Agreed, the "Default" monkier is so that we have something to match on.
The default for azure_blob if no credential is supplied will be to try anonymous blob requests.
The default for azure_logs_ingestion is to error with "2026-02-25T18:42:27.866373Z ERROR vector::topology::builder: Configuration error. error=Sink "az": auth.azure_tenant_id is blank; either use auth.azure_credential_kind, or provide tenant ID, client ID, and secret. internal_log_rate_limit=false"
There was a problem hiding this comment.
(sorry for the notification spam, I need more coffee)
There was a problem hiding this comment.
I think the core comment here is that instead of ClientSecretCredential being the target when you specify [auth] with no authorization kind, it should probably be ManagedIdentityCredential (assuming that the core customers are running Vector in an azure VM (which seems likely if they're using blob storage)).
I'm not an expert in the customer experience with the vector tool, but from the point of an Azure SDK developer, my personal preference would be to see something like:
[auth]
connection_string="string"or
[auth]
auth_kind=managed_identity_credential
managed_identity_id="<guid>"
managed_identity_type=system_managed_identityand disallow the use of the Default moniker entirely. Let the absence of a [auth] section express a desire for anonymous connections for those services that support anonymous connections.
There was a problem hiding this comment.
assuming that the core customers are running Vector in an azure VM
No - based on issues and PR's, Vector certainly isn't exclusively used on Azure VMs, and the preference seems to be for connection_string based auth, followed by client ID/secret, then Workload Identity, then Managed Identity.
[auth]
connection_string="string"
I see your point, but at this point it's probably still better to leave connection_string as a top level config item, as to not make this a breaking change. Additionally, it's only relevant to Blob and Event Hubs - Metrics, Log Ingestion, Data Explorer, etc. all require service principal auth, so we don't want to establish a convention of expecting it in the [auth] config space.
There was a problem hiding this comment.
I was actually talking about the azure blob storage sink specifically. Vector is likely used in a lot of environments, but how many azure blob storage customers are running vector outside of an Azure VM?
But this isn't a critical issue to me. I'm more concerned about moving the account name out of the connection-string when [auth] is used.
In general, connection strings are considered "bad" and Microsoft is working extremely hard to move customers away from them in all circumstances (that's also why connection string authentication isn't supported in the azure-sdk-for-rust - they're inherently unsafe).
| pub(super) acknowledgements: AcknowledgementsConfig, | ||
|
|
||
| #[serde(default)] | ||
| #[configurable(derived)] |
There was a problem hiding this comment.
It might make sense for the tls option be a #[cfg(test)] field, since it is essentially a test hook.
| impl SinkConfig for AzureBlobSinkConfig { | ||
| async fn build(&self, cx: SinkContext) -> Result<(VectorSink, Healthcheck)> { | ||
| let client = azure_common::config::build_client( | ||
| self.auth.clone(), |
There was a problem hiding this comment.
You might consider borrowing these parameters rather than cloning them. Especially since this is a synchronous function, so borrowing has less heap overhead.
There was a problem hiding this comment.
The project convention is to clone, and from a quick search, the AWS sinks' &self.base.auth seem to be the only ones borrowing... @pront @thomasqueirozb was there a particular reason?
| let config = AzureBlobSinkConfig { | ||
| auth: Some(client_secret_credential), | ||
| tls: Some(TlsConfig { | ||
| ca_file: Some(tls::TEST_PEM_CA_PATH.into()), |
There was a problem hiding this comment.
Is it possible you have a mismatch between the common name in the certificate and the name of the endpoint?
Looking at your CSR, it appears that the common name in the certificate request is "azurite", not "localhost"
| async fn azure_blob_build_config_with_client_id_and_secret() { | ||
| let config: AzureBlobSinkConfig = toml::from_str::<AzureBlobSinkConfig>( | ||
| r#" | ||
| connection_string = "AccountName=mylogstorage" |
There was a problem hiding this comment.
Does it make sense to have tests for the other authentication types as well?
There was a problem hiding this comment.
It does - once the integration test was working, I was planning on filling out the other test cases, including a copy+pasta of https://github.com/jlaundry/vector/blob/30e73b5120a32b78cbdbd0329164fd753c4134d6/src/sinks/azure_logs_ingestion/tests.rs#L125-L156
| sinks-aws_sns = ["aws-core", "dep:aws-sdk-sns"] | ||
| sinks-axiom = ["sinks-http"] | ||
| sinks-azure_blob = ["dep:azure_core", "dep:azure_storage_blob"] | ||
| sinks-azure_blob = ["dep:azure_core", "dep:azure_identity", "dep:azure_storage_blob"] |
There was a problem hiding this comment.
You might consider updating to the current (february) Azure SDK. The only iffy part of that is that it brings a dependency on reqwest version 13, not 12.
|
Hey @jlaundry thanks for doing this! Would be possible to add ClientCertificateCredential as well? We would certainly use it! |
Summary
As mentioned in #24492 (comment), now that #22912 has landed, we can make the
AzureAuthenticationconfig generic, so that the other Azure authentication types can be re-supported byazure_blob(and eventuallyazure_data_explorer#24633, andazure_event_hub#24659).This currently includes Azure CLI, Managed Identity, Workload Identity, as well as a special chained Managed Identity Client Assertion. I'm happy to add others that people believe they have a use-case for, I just didn't want to add code that was unlikely to be used.
Todo list
AzureAuthenticationconfig typeblock_in_place(int tests fail withthread 'sinks::azure_blob::test::azure_blob_build_config_with_client_id_and_secret' (1977) panicked at src/sinks/azure_common/config.rs:380:43: can call blocking only when running on the multi-threaded runtime)Vector configuration
For example:
How did you test this PR?
Currently testing in my lab environment; I've got WIP for running the integration test suite, but it's failing to pick up the integration test CA (#24729 (review))
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.