Skip to content

Netty async streaming response may miss terminal signal with AsyncResponseTransformer.toPublisher() #6949

@superhx

Description

@superhx

Bug Report: Netty toPublisher() Can Suppress Channel-Inactive Error Before Body Subscriber Terminal Signal

Suggested issue title:

Netty async streaming response may miss terminal signal with AsyncResponseTransformer.toPublisher()

Describe the bug

AWS SDK Java v2 2.44.4 appears to contain a logic-level missing-terminal-signal risk for async streaming responses using NettyNioAsyncHttpClient, for example:

CompletableFuture<ResponsePublisher<GetObjectResponse>> responseFuture =
    s3AsyncClient.getObject(request, AsyncResponseTransformer.toPublisher());

AsyncResponseTransformer.toPublisher() completes the client-returned future when the response object is available and the response body begins streaming. After that, the SDK user subscribes to the returned ResponsePublisher and waits for the body subscriber to receive either Subscriber.onComplete() or Subscriber.onError(Throwable).

The suspected issue is that the Netty response path can mark the HTTP stream as complete before the SDK has finalized the caller-visible response. If the channel becomes inactive in that gap, the current ResponseHandler.channelInactive() logic can suppress the fallback error because STREAMING_COMPLETE_KEY=true, even though RESPONSE_COMPLETE_KEY has not been set to true.

The relevant state is:

STREAMING_COMPLETE_KEY == true
RESPONSE_COMPLETE_KEY != true

In that state, Netty has observed LastHttpContent, but the SDK has not necessarily delivered the body subscriber terminal signal or finalized the response.

This is a potential SDK-user-visible hang: a user that has already subscribed to the returned ResponsePublisher may wait indefinitely for onComplete() or onError(Throwable).

Regression Issue

  • Select this option if this issue appears to be a regression.

I do not know whether this is a regression.

Expected Behavior

When using AsyncResponseTransformer.toPublisher(), once the user subscribes to the returned ResponsePublisher, the body subscriber should eventually receive one terminal signal:

Subscriber.onComplete()

or:

Subscriber.onError(Throwable)

If the channel becomes inactive before the SDK has finalized the caller-visible response, the SDK should surface an error instead of silently suppressing it.

More specifically, STREAMING_COMPLETE_KEY=true should not be treated as sufficient proof that the caller-visible response has completed. The safer completion condition for suppressing a channel-inactive fallback error is RESPONSE_COMPLETE_KEY=true.

Current Behavior

The checked upstream code was:

Repository: aws/aws-sdk-java-v2
Release:    2.44.4
Published:  2026-05-07
Commit:     2619c3d643ba0444a9ea9d9234cf119fa4968931

The Netty client uses two different channel attributes:

STREAMING_COMPLETE_KEY = Netty observed LastHttpContent
RESPONSE_COMPLETE_KEY = SDK response was finalized

HttpStreamsHandler.handleReadHttpContent sets STREAMING_COMPLETE_KEY=true as soon as LastHttpContent is observed:

boolean lastHttpContent = content instanceof LastHttpContent;
if (lastHttpContent) {
    ctx.channel().attr(STREAMING_COMPLETE_KEY).set(true);
}

File:

http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/nrs/HttpStreamsHandler.java

The user-visible body completion happens later through ResponseHandler.PublisherAdapter.onComplete():

runAndLogError(channelContext.channel(),
               () -> String.format("Subscriber %s threw an exception in onComplete.", subscriber),
               subscriber::onComplete);

and only after that does the SDK finalize the response:

finalizeResponse(requestContext, channelContext);

finalizeResponse sets RESPONSE_COMPLETE_KEY=true and completes the HTTP execute future:

channelContext.channel().attr(RESPONSE_COMPLETE_KEY).set(true);
executeFuture(channelContext).complete(null);

File:

http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/ResponseHandler.java

However, ResponseHandler.notifyIfResponseNotCompleted() suppresses the channel-inactive fallback error if either RESPONSE_COMPLETE_KEY or STREAMING_COMPLETE_KEY is true:

Boolean responseCompleted = handlerCtx.channel().attr(RESPONSE_COMPLETE_KEY).get();
Boolean isStreamingComplete = handlerCtx.channel().attr(STREAMING_COMPLETE_KEY).get();

if (!Boolean.TRUE.equals(responseCompleted) && !Boolean.TRUE.equals(isStreamingComplete)) {
    IOException err = new IOException(NettyUtils.closedChannelMessage(handlerCtx.channel()));
    runAndLogError(handlerCtx.channel(), () -> "Fail to execute SdkAsyncHttpResponseHandler#onError",
                   () -> requestCtx.handler().onError(err));
    executeFuture(handlerCtx).completeExceptionally(err);
    runAndLogError(handlerCtx.channel(), () -> "Could not release channel", () -> closeAndRelease(handlerCtx));
}

This means the following state suppresses the fallback error:

RESPONSE_COMPLETE_KEY != true
STREAMING_COMPLETE_KEY == true

That condition is weaker than caller-visible completion. It only proves that Netty saw the final HTTP content; it does not prove that the returned ResponsePublisher delivered onComplete() or onError(Throwable) to the SDK user.

Reproduction Steps

I do not have a deterministic standalone network reproduction. This report is based on the SDK source state machine, where the problematic state is possible by inspection.

Minimal SDK-user usage pattern:

S3AsyncClient s3 = S3AsyncClient.builder()
                                .httpClientBuilder(NettyNioAsyncHttpClient.builder())
                                .build();

CompletableFuture<ResponsePublisher<GetObjectResponse>> responseFuture =
    s3.getObject(getObjectRequest, AsyncResponseTransformer.toPublisher());

ResponsePublisher<GetObjectResponse> responsePublisher = responseFuture.join();

CompletableFuture<Void> bodyDone = new CompletableFuture<>();
responsePublisher.subscribe(new Subscriber<ByteBuffer>() {
    private Subscription subscription;

    @Override
    public void onSubscribe(Subscription subscription) {
        this.subscription = subscription;
        subscription.request(Long.MAX_VALUE);
    }

    @Override
    public void onNext(ByteBuffer byteBuffer) {
        // Consume body bytes.
    }

    @Override
    public void onError(Throwable throwable) {
        bodyDone.completeExceptionally(throwable);
    }

    @Override
    public void onComplete() {
        bodyDone.complete(null);
    }
});

bodyDone.join();

Logic-level reproduction path:

1. Use an async streaming response with NettyNioAsyncHttpClient and AsyncResponseTransformer.toPublisher().
2. The SDK completes the getObject future when the response body begins streaming.
3. The user subscribes to the returned ResponsePublisher.
4. Netty receives LastHttpContent.
5. HttpStreamsHandler sets STREAMING_COMPLETE_KEY=true.
6. Before ResponseHandler.PublisherAdapter.onComplete() reaches user subscriber.onComplete()
   and before finalizeResponse() sets RESPONSE_COMPLETE_KEY=true, the channel becomes inactive.
7. ResponseHandler.channelInactive() calls notifyIfResponseNotCompleted().
8. notifyIfResponseNotCompleted sees STREAMING_COMPLETE_KEY=true and RESPONSE_COMPLETE_KEY=false.
9. The fallback IOException is suppressed, so the user-visible body subscriber may receive no terminal signal.

The normal path should complete correctly:

LastHttpContent
  -> STREAMING_COMPLETE_KEY=true
  -> HandlerPublisher.complete()
  -> PublisherAdapter.onComplete()
  -> user subscriber.onComplete()
  -> finalizeResponse()
  -> RESPONSE_COMPLETE_KEY=true

The concern is the gap between STREAMING_COMPLETE_KEY=true and RESPONSE_COMPLETE_KEY=true.

Possible Solution

ResponseHandler.notifyIfResponseNotCompleted() should not treat STREAMING_COMPLETE_KEY=true as sufficient evidence that the caller-visible response has completed.

Safer behavior would be to surface the channel-inactive error whenever:

RESPONSE_COMPLETE_KEY != true

At minimum, the case below should not be silent:

STREAMING_COMPLETE_KEY == true
RESPONSE_COMPLETE_KEY != true

A regression test could explicitly set or drive this state and call channelInactive(), then assert that the SDK surfaces an error to the response handler, the execute future, or the user-visible body subscriber instead of suppressing the error.

Additional Information/Context

This report does not claim that LastHttpContent normally prevents subscriber.onComplete(). In the normal path, the SDK should call subscriber.onComplete() and then finalizeResponse().

The issue is that the current channel-inactive guard uses a lower-level Netty stream marker as if it were equivalent to caller-visible completion. These are not equivalent:

STREAMING_COMPLETE_KEY=true
  means Netty observed LastHttpContent.

RESPONSE_COMPLETE_KEY=true
  means ResponseHandler.finalizeResponse() ran.

For toPublisher(), caller-visible body completion is the body subscriber receiving onComplete() or onError(Throwable), not the initial CompletableFuture<ResponsePublisher<GetObjectResponse>> completing.

The ResponsePublisher timeout does not fix this case. It only cancels the underlying publisher when the SDK user does not subscribe promptly:

subscribed = true;
if (timeoutTask != null) {
    timeoutTask.cancel(false);
}

publisher.subscribe(subscriber);

Once the user has subscribed, this timeout is cancelled and does not protect against a missing post-subscribe terminal signal.

The API call and API call attempt timers also do not close this gap by default. The async pipeline attaches those timers to the SDK operation future, and AsyncResponseTransformer.toPublisher() completes that future when the body publisher begins streaming. Once the getObject(..., toPublisher()) future is complete, those operation-level timers are cancelled even though the body subscriber may continue waiting for its terminal signal.

I also did not find a targeted regression test for:

STREAMING_COMPLETE_KEY == true
RESPONSE_COMPLETE_KEY != true
channelInactive()

AWS Java SDK version used

2.44.4

JDK version used

Not environment-specific. Found by source inspection of aws/aws-sdk-java-v2.

Operating System and version

Not environment-specific. Found by source inspection of aws/aws-sdk-java-v2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions