You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JAVA-6071
This ticket involves two separate bugs in a test case testBulkWriteHandlesWriteErrorsAcrossBatches for reactive driver , specifically when ordered field is false
IllegalState exception on Mono Timeout
The first part of the failed test is
[2026/01/21 16:07:57.619] FAILURE: org.opentest4j.AssertionFailedError: Unexpected exception type thrown, expected: <com.mongodb.ClientBulkWriteException> but was: <java.lang.IllegalStateException> (org.opentest4j.AssertionFailedError)
Here is the link to the build
Here is the log that caught my attention
[2026/01/21 16:03:37.770] 00:03:37.750 [cluster-ClusterId{value='697169596f43721a0d1b8470', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, cryptd=false, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=25, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=3744986, minRoundTripTimeNanos=0}
[2026/01/21 16:05:39.870] 6. MongoClient.bulkWrite handles individual WriteErrors across batches
As you can see the time diff between two last logs is exactly two minutes , and according to our settings, the sync driver blocks mono by two minutes
So timeout happened and Mono.block threw IllegalStateException
This PR does not solve this issue, the IllegalState is expected, unless we want to override timeout for this test case only as it deals with a huge number of documents per batch
Connection leak
Second asserting that failed on this test case is connection leak
[2026/01/21 16:07:57.619] The connection pool listener reports '1' open connections.
[2026/01/21 16:07:57.619] at com.mongodb.assertions.Assertions.assertTrue(Assertions.java:190)
[2026/01/21 16:07:57.619] at com.mongodb.reactivestreams.client.syncadapter.SyncMongoClient$ConnectionPoolCounter.assertConnectionsClosed(SyncMongoClient.java:373)
[2026/01/21 16:07:57.619] at com.mongodb.reactivestreams.client.syncadapter.SyncMongoClient.close(SyncMongoClient.java:305)
[2026/01/21 16:07:57.619] at com.mongodb.client.CrudProseTest.testBulkWriteHandlesWriteErrorsAcrossBatches(CrudProseTest.java:246)
[2026/01/21 16:07:57.619] ... 39 more
Before looking at at the leak, I started checking what the test case is actually doing I noticed that it tries to insert documents in batches of 10K invalid(duplicate ids) docs per batch
The assertion that happens for connection leak happens here
The code is running a loop which fails after 2 seconds if not all connections are closed, in my local env I noticed that 2 seconds is not enough for such a huge number of documents to be processed so I introduced a new method that can propagate and override this timeout, after running this test case 100 times locally I didn't notice the connection leak anymore
To test it locally
change the Mono timeout from 2 min to 2 seconds so that you can simulate IllegalStateException
add RepeatedTest to this test case, in my case , I hardcoded ordered=false instead of relying on parametrized test so that I can use RepeatedTest
NullPointerException
The last part of this failed test is NullPointer here
[2026/01/21 16:07:57.619] Exception in thread "Thread-29" java.lang.NullPointerException: Cannot invoke "java.nio.ByteBuffer.hasRemaining()" because "this.buf" is null
[2026/01/21 16:07:57.619] at org.bson.ByteBufNIO.hasRemaining(ByteBufNIO.java:91)
The exception makes sense; hasRemaining() throws an NPE because the underlying buffer becomes null once released. While I couldn't reproduce this after 100 local test runs, I added a guard for ByteBufNIO. By using asNio() first (which acts as a direct getter for ByteBufNIO but avoids the buffer copying found in Netty or other implementations), we can safely check if the underlying buffer is null before checking hasRemaining , for other implementation we still rely on hasRemaining
I don't like this approach because of abstraction leak and open for any suggestions
I think this is symptomatic of the higher resource being released too early issue, so as much as this prevents the NPE it doesn't solve the root cause.
I think this should be addressed by #1873, #1874 & #1876 and a future PR for Session & connection monitor that will be built upon those changes.
I think this is symptomatic of the higher resource being released too early issue, so as much as this prevents the NPE it doesn't solve the root cause.
I think this should be addressed by #1873, #1874 & #1876 and a future PR for Session & connection monitor that will be built upon those changes.
hey @rozza , after looking at your listed fixes I didn't understand if they will fix NPE in ByteBufNIO but we can give it a try, I can remove my NPE fix but still keep longer timeout for connection pool to close , let me know if it works for you
@strogiyotec apologies, my changes will help with the byte buffer management from InternalStreamConnection which uses the Stream. However, your correct that is not obvious if that will help with these internals of AsynchronousChannelStream and pipeOneBuffer.
I'd like to understand more about the failure scenario:
Why does byteBuffer.hasRemaining() NPE?
Why won't calling byteBuffer.asNIO() also NPE?
Looks like when a ByteBufNIO is fully released the underlying buf is set to null which then goes on to cause the NPE. What I'm not sure of is the source of the release.
Is there anyway to replicate the error locally? I have two questions about the root cause:
Is it a resource counting issue that will be fixed by the InternalStream and ByteBuf improvement PR's?
Is it a race condtion? where the AsynchronousChannelStream is closed while a pipeOneBuffer is in flight? I have a feeling we'd have seen it more often if that was the case. Should there be a check to see if isClosed() before checking the buffer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
JAVA-6071
This ticket involves two separate bugs in a test case
testBulkWriteHandlesWriteErrorsAcrossBatchesfor reactive driver , specifically whenorderedfield is falseIllegalState exception on Mono Timeout
The first part of the failed test is
Here is the link to the build
Here is the log that caught my attention
As you can see the time diff between two last logs is exactly two minutes , and according to our settings, the sync driver blocks mono by two minutes
So timeout happened and
Mono.blockthrewIllegalStateExceptionThis PR does not solve this issue, the
IllegalStateis expected, unless we want to override timeout for this test case only as it deals with a huge number of documents per batchConnection leak
Second asserting that failed on this test case is connection leak
Before looking at at the leak, I started checking what the test case is actually doing I noticed that it tries to insert documents in batches of 10K invalid(duplicate ids) docs per batch
The assertion that happens for connection leak happens here
The code is running a loop which fails after 2 seconds if not all connections are closed, in my local env I noticed that 2 seconds is not enough for such a huge number of documents to be processed so I introduced a new method that can propagate and override this timeout, after running this test case 100 times locally I didn't notice the connection leak anymore
To test it locally
IllegalStateExceptionRepeatedTestto this test case, in my case , I hardcodedordered=falseinstead of relying on parametrized test so that I can useRepeatedTestNullPointerException
The last part of this failed test is NullPointer here
The exception makes sense;
hasRemaining()throws an NPE because the underlying buffer becomes null once released. While I couldn't reproduce this after 100 local test runs, I added a guard forByteBufNIO. By usingasNio()first (which acts as a direct getter forByteBufNIObut avoids the buffer copying found in Netty or other implementations), we can safely check if the underlying buffer is null before checkinghasRemaining, for other implementation we still rely onhasRemainingI don't like this approach because of abstraction leak and open for any suggestions