aws: msk_iam: pass TLS context to credential provider chain#11912
aws: msk_iam: pass TLS context to credential provider chain#11912jamesdangercarpenter wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds creation and proper cleanup of per-request TLS contexts ( ChangesMSK IAM TLS Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)src/aws/flb_aws_msk_iam.csrc/aws/flb_aws_msk_iam.c:20:10: fatal error: 'fluent-bit/flb_info.h' file not found ... [truncated 736 characters] ... fer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f5a1d661b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/aws/flb_aws_msk_iam.c (1)
248-267:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winResource leak:
cred_tlsnot destroyed on early return paths.The newly-introduced
cred_tls(line 229) is not cleaned up when these three error paths return directly. All should usegoto errorto ensure proper cleanup.🐛 Proposed fix
if (temp_provider->provider_vtable->init(temp_provider) != 0) { flb_error("[aws_msk_iam] build_msk_iam_payload: failed to initialize AWS credentials provider"); flb_aws_provider_destroy(temp_provider); - return NULL; + temp_provider = NULL; /* Prevent double-free in error handler */ + goto error; } /* Get credentials */ creds = temp_provider->provider_vtable->get_credentials(temp_provider); if (!creds) { flb_error("[aws_msk_iam] build_msk_iam_payload: failed to get credentials"); - flb_aws_provider_destroy(temp_provider); - return NULL; + goto error; } if (!creds->access_key_id || !creds->secret_access_key) { flb_error("[aws_msk_iam] build_msk_iam_payload: incomplete credentials"); - flb_aws_credentials_destroy(creds); - flb_aws_provider_destroy(temp_provider); - return NULL; + goto error; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/aws/flb_aws_msk_iam.c` around lines 248 - 267, In build_msk_iam_payload, the early returns after temp_provider->provider_vtable->init, temp_provider->provider_vtable->get_credentials, and the incomplete-credentials check leak cred_tls; replace those direct returns with a jump to the existing error cleanup block (use goto error) so that cred_tls is destroyed along with flb_aws_credentials_destroy(creds) and flb_aws_provider_destroy(temp_provider); ensure the error path explicitly frees cred_tls and leaves other cleanup calls intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/aws/flb_aws_msk_iam.c`:
- Around line 248-267: In build_msk_iam_payload, the early returns after
temp_provider->provider_vtable->init,
temp_provider->provider_vtable->get_credentials, and the incomplete-credentials
check leak cred_tls; replace those direct returns with a jump to the existing
error cleanup block (use goto error) so that cred_tls is destroyed along with
flb_aws_credentials_destroy(creds) and flb_aws_provider_destroy(temp_provider);
ensure the error path explicitly frees cred_tls and leaves other cleanup calls
intact.
The MSK IAM OAUTHBEARER callback created its AWS credential provider chain
with a NULL TLS context:
flb_standard_chain_provider_create(config->flb_config, NULL, ...)
The provider chain reaches AWS endpoints over HTTPS - in particular the STS
AssumeRoleWithWebIdentity call used by IRSA (IAM Roles for Service Accounts)
on EKS. With no TLS context that call cannot connect, and the chain silently
falls back to the EC2 instance metadata service, picking up the node role
instead of the pod's IRSA role. The MSK token is then signed with the wrong
principal and the broker rejects it with "SASL authentication error: Access
denied", making IRSA unusable with native aws_msk_iam.
Create a dedicated TLS instance (verifying peers against the system trust
store, as the other AWS plugins do) for each credential provider and tear it
down together with the provider. A separate instance is used at each
provider-creation site since TLS instances cannot be shared between
providers.
Fixes fluent#11255
Signed-off-by: James Carpenter <james@james-carpenter.net>
5f5a1d6 to
eadd8a7
Compare
Summary
Native
aws_msk_iamauthentication is currently unusable with IRSA (IAM Roles for Service Accounts) on EKS.The OAUTHBEARER token-refresh callback builds its AWS credential provider chain with a
NULLTLS context:The provider chain reaches AWS endpoints over HTTPS — in particular the STS
AssumeRoleWithWebIdentitycall that IRSA depends on. With no TLS context that call cannot connect, and the chain silently falls back to the EC2 instance metadata service, picking up the node instance role instead of the pod's IRSA role. The MSK token is then signed with the wrong principal and the broker rejects it:Fix
Create a dedicated TLS instance for each credential provider — verifying peers against the system trust store, exactly as the other AWS output plugins (
out_cloudwatch_logs,out_s3,out_kinesis_*) do — and tear it down together with the provider. A separate instance is used at each provider-creation site, since TLS instances cannot be shared between providers (see the note inflb_aws_credentials.h).This is a focused fix for the credential-fetch TLS bug only; it does not change the configuration surface or token-signing logic.
Testing
AssumeRoleWithWebIdentitypath is used (pod role, not node role) and authentication succeeds.Fixes #11255
Summary by CodeRabbit