feat: Credential provider support#4335
Conversation
|
cc: @mbutrovich @parthchandra @andygrove |
|
thanks @karuppayya thats actually a good moment for this PR. @snmvaughan FYI |
|
Thanks for putting this together @karuppayya! We're definitely thinking about the same problem right now. :) One thing I'd like to push on before we go further. A fair amount of this PR is distribution and lifecycle wiring (the per-catalog broadcast cache, threading the broadcast through My concern is scope. Comet's value is accelerating compute-intensive operations in native code, and I'd rather we not take on responsibility for reimplementing Spark's credential plumbing inside Comet when vendors are already equipped to handle it. #4309 takes the narrower position of owning only the JNI shape and the per-call context, and leaves caching, refresh, and distribution to the vendor implementation. Is there a scenario you have in mind that #4309's shape cannot cover? |
|
@mbutrovich Appreciate the feedback — agree this is worth pinning down before going further. The main scenario is REST-catalog deployments where vended credentials and refresh config originate in the driver's catalog context. 1. Iceberg already does exactly init+broadcast for 2. I think REST makes this broadcast load-bearing( and not optional i think). 3. Multi-catalog. This change keys broadcasts by catalog name, so per-catalog isolation works. #4309's SPI has no catalog identifier, so two catalogs sharing a vendor class silently collide on one cached instance — fixable though. 4. #4309 requires every vendor to ship Comet-Spark plumbing. A REST-catalog vendor would need to redo much of what this change does i guess Curious what you think. |
|
Thanks for the careful citations @karuppayya, those helped a lot. I'm trying to figure out whether the broadcast and refresh machinery in this PR is something Comet should own, or whether a vendor implementation under #4309's SPI could give you the same end result with less Comet-side surface area. I'd love your read on that. One scope thing I want to flag up front: we also need this for raw Parquet on S3, and any other native S3 reader Comet grows. Iceberg-vended creds are one important case, but vendors with custom signers or per-path STS for plain Parquet hit the same wall (Hadoop S3A signers don't surface the secret, On the REST-vended credentials point, the reason #4309 today can't reach On multi-catalog with shared FQCN, you're right that #4309 has the gap today. I think it's fixable inside the existing SPI shape by extending What I'm less sure should live in Comet is refresh and distribution. The Rust-side TTL cache, the per-plan single-catalog invariant that throws on multi-catalog plans, and the driver-side broadcast cache feel like things Iceberg's own public class IcebergRESTVendedS3Provider implements CometS3CredentialProvider {
private volatile VendedCredentialsProvider provider;
@Override
public void initialize(Map<String, String> catalogProperties) {
this.provider = VendedCredentialsProvider.create(catalogProperties);
}
@Override
public CometS3Credentials getCredentialsForPath(
String bucket, String path, CometS3AccessMode mode) {
AwsSessionCredentials c = (AwsSessionCredentials) provider.resolveCredentials();
long expiresAt = c.expirationTime().map(Instant::toEpochMilli).orElse(-1L);
return new CometS3Credentials(
c.accessKeyId(), c.secretAccessKey(), c.sessionToken(), expiresAt);
}
}Iceberg's Does this shape cover what you need, or is there a scenario I'm missing where the vendor model in #4309 (extended with the unfiltered properties + |
|
Wanted to flag that the conversation we had on this PR shaped a few changes I just added in #4309. None of these are Comet adopting #4335 wholesale, but each one came from a real concern you raised, and I want the trail to be visible. REST vended creds reach the SPI. Multi-catalog isolation under shared FQCN. Added Reference REST vended-creds provider. Added What I deliberately did not pull over. No Rust-side TTL cache, no driver-side broadcast cache for the credential provider, no plan walker that throws on multi-catalog plans. Caching policy stays with the vendor ( |
|
Just wanted to add my 2 bits to the credentials refreshing bit. The credentials providers are going to be executed on each executor and each executor will essentially request credentials at the same time. When running on a very large scale, this has been seen to sometimes overwhelm credentials backends leading to system-wide job failure. So caching the credentials at the executors makes sense, but it is generally better to refresh centrally and distribute the credentials. It makes sense for the engine to do the refresh. For instance, in Spark, Kerberos delegation tokens are managed by Spark centrally in DelegationTokenManager This does open the question of secure distribution of the credentials. Broadcast on an insecure channel will not do. The credentials distribution needs TLS. |
|
Thanks @mbutrovich for addressing my concerns. The Rust-side TTL is a JNI micro-optimization — if the switch between JVM and native isn't expensive, we can leave it out. @parthchandra — fair point on the herd. I don't think we should go with centralized refresh though:
Happy to close this PR in favor of #4309 and continue review on the other — we need this change soon. 🙂 |
|
@mbutrovich I will test this change today. |
Which issue does this PR close?
Closes #4332
Rationale for this change
Comet's native Iceberg scan only supports static AWS credentials captured at plan time, so jobs running longer than the session-token lifetime will fail mid-scan.
Iceberg-rust already exposes a pluggability hook — Comet just wasn't using it.
This PR adds a CometCredentialProvider that the native scan calls back into via JNI when a refresh is needed.
What changes are included in this PR?
How are these changes tested?