Bug Report: Netty toPublisher() Can Suppress Channel-Inactive Error Before Body Subscriber Terminal Signal
Suggested issue title:
Netty async streaming response may miss terminal signal with AsyncResponseTransformer.toPublisher()
Describe the bug
AWS SDK Java v2 2.44.4 appears to contain a logic-level missing-terminal-signal risk for async streaming responses using NettyNioAsyncHttpClient, for example:
CompletableFuture<ResponsePublisher<GetObjectResponse>> responseFuture =
s3AsyncClient.getObject(request, AsyncResponseTransformer.toPublisher());
AsyncResponseTransformer.toPublisher() completes the client-returned future when the response object is available and the response body begins streaming. After that, the SDK user subscribes to the returned ResponsePublisher and waits for the body subscriber to receive either Subscriber.onComplete() or Subscriber.onError(Throwable).
The suspected issue is that the Netty response path can mark the HTTP stream as complete before the SDK has finalized the caller-visible response. If the channel becomes inactive in that gap, the current ResponseHandler.channelInactive() logic can suppress the fallback error because STREAMING_COMPLETE_KEY=true, even though RESPONSE_COMPLETE_KEY has not been set to true.
The relevant state is:
STREAMING_COMPLETE_KEY == true
RESPONSE_COMPLETE_KEY != true
In that state, Netty has observed LastHttpContent, but the SDK has not necessarily delivered the body subscriber terminal signal or finalized the response.
This is a potential SDK-user-visible hang: a user that has already subscribed to the returned ResponsePublisher may wait indefinitely for onComplete() or onError(Throwable).
Regression Issue
I do not know whether this is a regression.
Expected Behavior
When using AsyncResponseTransformer.toPublisher(), once the user subscribes to the returned ResponsePublisher, the body subscriber should eventually receive one terminal signal:
or:
Subscriber.onError(Throwable)
If the channel becomes inactive before the SDK has finalized the caller-visible response, the SDK should surface an error instead of silently suppressing it.
More specifically, STREAMING_COMPLETE_KEY=true should not be treated as sufficient proof that the caller-visible response has completed. The safer completion condition for suppressing a channel-inactive fallback error is RESPONSE_COMPLETE_KEY=true.
Current Behavior
The checked upstream code was:
Repository: aws/aws-sdk-java-v2
Release: 2.44.4
Published: 2026-05-07
Commit: 2619c3d643ba0444a9ea9d9234cf119fa4968931
The Netty client uses two different channel attributes:
STREAMING_COMPLETE_KEY = Netty observed LastHttpContent
RESPONSE_COMPLETE_KEY = SDK response was finalized
HttpStreamsHandler.handleReadHttpContent sets STREAMING_COMPLETE_KEY=true as soon as LastHttpContent is observed:
boolean lastHttpContent = content instanceof LastHttpContent;
if (lastHttpContent) {
ctx.channel().attr(STREAMING_COMPLETE_KEY).set(true);
}
File:
http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/nrs/HttpStreamsHandler.java
The user-visible body completion happens later through ResponseHandler.PublisherAdapter.onComplete():
runAndLogError(channelContext.channel(),
() -> String.format("Subscriber %s threw an exception in onComplete.", subscriber),
subscriber::onComplete);
and only after that does the SDK finalize the response:
finalizeResponse(requestContext, channelContext);
finalizeResponse sets RESPONSE_COMPLETE_KEY=true and completes the HTTP execute future:
channelContext.channel().attr(RESPONSE_COMPLETE_KEY).set(true);
executeFuture(channelContext).complete(null);
File:
http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/ResponseHandler.java
However, ResponseHandler.notifyIfResponseNotCompleted() suppresses the channel-inactive fallback error if either RESPONSE_COMPLETE_KEY or STREAMING_COMPLETE_KEY is true:
Boolean responseCompleted = handlerCtx.channel().attr(RESPONSE_COMPLETE_KEY).get();
Boolean isStreamingComplete = handlerCtx.channel().attr(STREAMING_COMPLETE_KEY).get();
if (!Boolean.TRUE.equals(responseCompleted) && !Boolean.TRUE.equals(isStreamingComplete)) {
IOException err = new IOException(NettyUtils.closedChannelMessage(handlerCtx.channel()));
runAndLogError(handlerCtx.channel(), () -> "Fail to execute SdkAsyncHttpResponseHandler#onError",
() -> requestCtx.handler().onError(err));
executeFuture(handlerCtx).completeExceptionally(err);
runAndLogError(handlerCtx.channel(), () -> "Could not release channel", () -> closeAndRelease(handlerCtx));
}
This means the following state suppresses the fallback error:
RESPONSE_COMPLETE_KEY != true
STREAMING_COMPLETE_KEY == true
That condition is weaker than caller-visible completion. It only proves that Netty saw the final HTTP content; it does not prove that the returned ResponsePublisher delivered onComplete() or onError(Throwable) to the SDK user.
Reproduction Steps
I do not have a deterministic standalone network reproduction. This report is based on the SDK source state machine, where the problematic state is possible by inspection.
Minimal SDK-user usage pattern:
S3AsyncClient s3 = S3AsyncClient.builder()
.httpClientBuilder(NettyNioAsyncHttpClient.builder())
.build();
CompletableFuture<ResponsePublisher<GetObjectResponse>> responseFuture =
s3.getObject(getObjectRequest, AsyncResponseTransformer.toPublisher());
ResponsePublisher<GetObjectResponse> responsePublisher = responseFuture.join();
CompletableFuture<Void> bodyDone = new CompletableFuture<>();
responsePublisher.subscribe(new Subscriber<ByteBuffer>() {
private Subscription subscription;
@Override
public void onSubscribe(Subscription subscription) {
this.subscription = subscription;
subscription.request(Long.MAX_VALUE);
}
@Override
public void onNext(ByteBuffer byteBuffer) {
// Consume body bytes.
}
@Override
public void onError(Throwable throwable) {
bodyDone.completeExceptionally(throwable);
}
@Override
public void onComplete() {
bodyDone.complete(null);
}
});
bodyDone.join();
Logic-level reproduction path:
1. Use an async streaming response with NettyNioAsyncHttpClient and AsyncResponseTransformer.toPublisher().
2. The SDK completes the getObject future when the response body begins streaming.
3. The user subscribes to the returned ResponsePublisher.
4. Netty receives LastHttpContent.
5. HttpStreamsHandler sets STREAMING_COMPLETE_KEY=true.
6. Before ResponseHandler.PublisherAdapter.onComplete() reaches user subscriber.onComplete()
and before finalizeResponse() sets RESPONSE_COMPLETE_KEY=true, the channel becomes inactive.
7. ResponseHandler.channelInactive() calls notifyIfResponseNotCompleted().
8. notifyIfResponseNotCompleted sees STREAMING_COMPLETE_KEY=true and RESPONSE_COMPLETE_KEY=false.
9. The fallback IOException is suppressed, so the user-visible body subscriber may receive no terminal signal.
The normal path should complete correctly:
LastHttpContent
-> STREAMING_COMPLETE_KEY=true
-> HandlerPublisher.complete()
-> PublisherAdapter.onComplete()
-> user subscriber.onComplete()
-> finalizeResponse()
-> RESPONSE_COMPLETE_KEY=true
The concern is the gap between STREAMING_COMPLETE_KEY=true and RESPONSE_COMPLETE_KEY=true.
Possible Solution
ResponseHandler.notifyIfResponseNotCompleted() should not treat STREAMING_COMPLETE_KEY=true as sufficient evidence that the caller-visible response has completed.
Safer behavior would be to surface the channel-inactive error whenever:
RESPONSE_COMPLETE_KEY != true
At minimum, the case below should not be silent:
STREAMING_COMPLETE_KEY == true
RESPONSE_COMPLETE_KEY != true
A regression test could explicitly set or drive this state and call channelInactive(), then assert that the SDK surfaces an error to the response handler, the execute future, or the user-visible body subscriber instead of suppressing the error.
Additional Information/Context
This report does not claim that LastHttpContent normally prevents subscriber.onComplete(). In the normal path, the SDK should call subscriber.onComplete() and then finalizeResponse().
The issue is that the current channel-inactive guard uses a lower-level Netty stream marker as if it were equivalent to caller-visible completion. These are not equivalent:
STREAMING_COMPLETE_KEY=true
means Netty observed LastHttpContent.
RESPONSE_COMPLETE_KEY=true
means ResponseHandler.finalizeResponse() ran.
For toPublisher(), caller-visible body completion is the body subscriber receiving onComplete() or onError(Throwable), not the initial CompletableFuture<ResponsePublisher<GetObjectResponse>> completing.
The ResponsePublisher timeout does not fix this case. It only cancels the underlying publisher when the SDK user does not subscribe promptly:
subscribed = true;
if (timeoutTask != null) {
timeoutTask.cancel(false);
}
publisher.subscribe(subscriber);
Once the user has subscribed, this timeout is cancelled and does not protect against a missing post-subscribe terminal signal.
The API call and API call attempt timers also do not close this gap by default. The async pipeline attaches those timers to the SDK operation future, and AsyncResponseTransformer.toPublisher() completes that future when the body publisher begins streaming. Once the getObject(..., toPublisher()) future is complete, those operation-level timers are cancelled even though the body subscriber may continue waiting for its terminal signal.
I also did not find a targeted regression test for:
STREAMING_COMPLETE_KEY == true
RESPONSE_COMPLETE_KEY != true
channelInactive()
AWS Java SDK version used
JDK version used
Not environment-specific. Found by source inspection of aws/aws-sdk-java-v2.
Operating System and version
Not environment-specific. Found by source inspection of aws/aws-sdk-java-v2.
Bug Report: Netty
toPublisher()Can Suppress Channel-Inactive Error Before Body Subscriber Terminal SignalSuggested issue title:
Describe the bug
AWS SDK Java v2
2.44.4appears to contain a logic-level missing-terminal-signal risk for async streaming responses usingNettyNioAsyncHttpClient, for example:AsyncResponseTransformer.toPublisher()completes the client-returned future when the response object is available and the response body begins streaming. After that, the SDK user subscribes to the returnedResponsePublisherand waits for the body subscriber to receive eitherSubscriber.onComplete()orSubscriber.onError(Throwable).The suspected issue is that the Netty response path can mark the HTTP stream as complete before the SDK has finalized the caller-visible response. If the channel becomes inactive in that gap, the current
ResponseHandler.channelInactive()logic can suppress the fallback error becauseSTREAMING_COMPLETE_KEY=true, even thoughRESPONSE_COMPLETE_KEYhas not been set to true.The relevant state is:
In that state, Netty has observed
LastHttpContent, but the SDK has not necessarily delivered the body subscriber terminal signal or finalized the response.This is a potential SDK-user-visible hang: a user that has already subscribed to the returned
ResponsePublishermay wait indefinitely foronComplete()oronError(Throwable).Regression Issue
I do not know whether this is a regression.
Expected Behavior
When using
AsyncResponseTransformer.toPublisher(), once the user subscribes to the returnedResponsePublisher, the body subscriber should eventually receive one terminal signal:or:
If the channel becomes inactive before the SDK has finalized the caller-visible response, the SDK should surface an error instead of silently suppressing it.
More specifically,
STREAMING_COMPLETE_KEY=trueshould not be treated as sufficient proof that the caller-visible response has completed. The safer completion condition for suppressing a channel-inactive fallback error isRESPONSE_COMPLETE_KEY=true.Current Behavior
The checked upstream code was:
The Netty client uses two different channel attributes:
HttpStreamsHandler.handleReadHttpContentsetsSTREAMING_COMPLETE_KEY=trueas soon asLastHttpContentis observed:File:
The user-visible body completion happens later through
ResponseHandler.PublisherAdapter.onComplete():and only after that does the SDK finalize the response:
finalizeResponsesetsRESPONSE_COMPLETE_KEY=trueand completes the HTTP execute future:File:
However,
ResponseHandler.notifyIfResponseNotCompleted()suppresses the channel-inactive fallback error if eitherRESPONSE_COMPLETE_KEYorSTREAMING_COMPLETE_KEYis true:This means the following state suppresses the fallback error:
That condition is weaker than caller-visible completion. It only proves that Netty saw the final HTTP content; it does not prove that the returned
ResponsePublisherdeliveredonComplete()oronError(Throwable)to the SDK user.Reproduction Steps
I do not have a deterministic standalone network reproduction. This report is based on the SDK source state machine, where the problematic state is possible by inspection.
Minimal SDK-user usage pattern:
Logic-level reproduction path:
The normal path should complete correctly:
The concern is the gap between
STREAMING_COMPLETE_KEY=trueandRESPONSE_COMPLETE_KEY=true.Possible Solution
ResponseHandler.notifyIfResponseNotCompleted()should not treatSTREAMING_COMPLETE_KEY=trueas sufficient evidence that the caller-visible response has completed.Safer behavior would be to surface the channel-inactive error whenever:
At minimum, the case below should not be silent:
A regression test could explicitly set or drive this state and call
channelInactive(), then assert that the SDK surfaces an error to the response handler, the execute future, or the user-visible body subscriber instead of suppressing the error.Additional Information/Context
This report does not claim that
LastHttpContentnormally preventssubscriber.onComplete(). In the normal path, the SDK should callsubscriber.onComplete()and thenfinalizeResponse().The issue is that the current channel-inactive guard uses a lower-level Netty stream marker as if it were equivalent to caller-visible completion. These are not equivalent:
For
toPublisher(), caller-visible body completion is the body subscriber receivingonComplete()oronError(Throwable), not the initialCompletableFuture<ResponsePublisher<GetObjectResponse>>completing.The
ResponsePublishertimeout does not fix this case. It only cancels the underlying publisher when the SDK user does not subscribe promptly:Once the user has subscribed, this timeout is cancelled and does not protect against a missing post-subscribe terminal signal.
The API call and API call attempt timers also do not close this gap by default. The async pipeline attaches those timers to the SDK operation future, and
AsyncResponseTransformer.toPublisher()completes that future when the body publisher begins streaming. Once thegetObject(..., toPublisher())future is complete, those operation-level timers are cancelled even though the body subscriber may continue waiting for its terminal signal.I also did not find a targeted regression test for:
AWS Java SDK version used
JDK version used
Operating System and version