jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361) by fm4v · Pull Request #2857 · ClickHouse/clickhouse-java

fm4v · 2026-05-20T08:26:14Z

Summary

Fixes #2361. ResultSetImpl.close() no longer rethrows ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected from the chunked-response drain path. Iteration-time failures still surface through next(); this only fixes the close-cleanup noise.

Why

When the server tears down the response connection mid-stream (canonically because send_timeout fires on the writer side and the terminating zero-length chunk is never written), Apache HC's ChunkedInputStream.close() drains remaining bytes and trips Premature end of chunk coded message body. With compress=true the close path goes through FramedLZ4CompressorInputStream.close() → BoundedInputStream.close() → ChunkedInputStream.close() and the same drain failure surfaces.

ResultSetImpl.close() previously rethrew this as a SQLException, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in cleanup — they have no recourse other than wrapping every rs.close() (often implicit) in a try/catch. The reporter in #2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same pattern.

If iteration genuinely fails (server closed mid-next()), that exception still surfaces — through next(), not through close(). The bare-drain-on-close case is informational only.

What changed

jdbc-v2/src/main/java/com/clickhouse/jdbc/ResultSetImpl.java: gate the existing e = re; assignments in close() behind a new isStreamDrainException classifier. Drain-class exceptions are logged at debug and swallowed; all other close-time failures still propagate as before. Logic shape preserved.
jdbc-v2/src/test/java/com/clickhouse/jdbc/ResultSetImplCloseTest.java: six unit tests covering the classifier (bare message match, closing chunk expected variant, nested cause chain, class-name match without message, unrelated exceptions, null input). All pass.
examples/jdbc/src/main/java/com/clickhouse/examples/jdbc/Issue2361Repro.java: a self-contained main that reproduces the bug against a real ClickHouse server. Trigger recipe is in the class Javadoc: compression on, server send_timeout=1, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate.

Reproducer

java -DchUrl=jdbc:ch://localhost:8123 \
     com.clickhouse.examples.jdbc.Issue2361Repro 3

Before this PR (jdbc-v2 0.9.8 main):

iter 0: trips=1 elapsed=126840ms
iter 1: trips=2 elapsed=131245ms
...
FINAL: trips=2 / 3 iterations (67%)

After this PR:

iter 0: trips=0 elapsed=...
iter 1: trips=0 elapsed=...
FINAL: trips=0 / 3 iterations (0%)

Test plan

mvn -pl jdbc-v2 test -Dtest=ResultSetImplCloseTest — 6/6 pass
Standalone reproducer demonstrates 100% → 0% trip rate before vs after
Maintainer review for whether to also extend isStreamDrainException to other transient-cleanup exception classes (e.g. SSL SSLException on close, SocketException: Connection reset on close)

Correctness — is the data the application consumed complete?

Yes, in the failure scenario this PR addresses. The relevant invariants are in upstream code already, not added here:

AbstractBinaryFormatReader.readRecord(...) catches EOFException only when firstColumn=true (between rows) and treats it as end-of-stream. EOF mid-row is rethrown as IOException(recordReadExceptionMsg(...), e), which surfaces through next() — not through close().
ChunkedInputStream.read(...) in Apache HC returns -1 only when it has parsed the terminator zero-length chunk. If the TCP connection drops mid-chunk-header it throws ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected; if the chunk header is corrupt it throws MalformedChunkCodingException. Truncation never quietly looks like clean EOF to the upper layer.
Therefore the failure path covered by this PR corresponds to: the lz4-framed stream's end-of-stream marker was seen, the reader fully consumed the payload, next() legitimately returned false, and only then did the chunked-stream drain in close() hit the missing terminator chunk. The application has the correct row set; the close-time error is an HTTP-framing artefact, not a data-integrity one.

Other truncation cases (mid-row, mid-frame-header, malformed chunk) all surface through next() before close() is reached — those paths are unchanged by this PR. The isStreamDrainException classifier is intentionally narrow (Premature end of chunk / closing chunk expected message, or ConnectionClosedException class match) so unrelated close-time failures still propagate as before.

Fixes ClickHouse#2361. When the server tears down the response connection mid-stream (canonically because send_timeout fires on the writer side and the terminating zero-length chunk is never written), Apache HC's ChunkedInputStream.close() drains remaining bytes and trips `ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected`. With `compress=true` the close path goes through `FramedLZ4CompressorInputStream.close()` → `BoundedInputStream.close()` → `ChunkedInputStream.close()` and the same drain failure surfaces. `ResultSetImpl.close()` previously rethrew this as a `SQLException`, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in *cleanup* — they have no recourse other than wrapping every `rs.close()` (often implicit) in a try/catch. The reporter in ClickHouse#2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same. Downgrade chunked-drain failures to log-and-swallow. If iteration genuinely failed (server closed mid-`next()`), that exception still surfaces — through `next()`, not through `close()`. The bare-drain-fail case is informational only. Covers both bare `Premature end of chunk` messages and `ConnectionClosedException` class-name match for forward-compat with Apache HC rephrasings.

Augments the previous commit with: - Unit tests for `ResultSetImpl.isStreamDrainException` covering the six exception shapes the classifier needs to recognise (bare message, `closing chunk expected` variant, nested cause chain, class-name match without message, unrelated exceptions, null input). All six pass. - `examples/jdbc/.../Issue2361Repro.java` — a self-contained main that reproduces the bug against a real ClickHouse server. Recipe is documented in the class Javadoc: compression on, server `send_timeout=1`, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate. After the fix in `ResultSetImpl.close()` the iteration completes cleanly with the drain-close failure downgraded to a debug log. `isStreamDrainException` is package-private so it can be unit-tested without reflection.

github-actions · 2026-05-20T08:26:24Z

Repository collaborators can run the JMH benchmark suite against this PR by commenting:

/benchmark

Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%):

/benchmark threshold=15

Only one benchmark run per PR is active at a time — issuing a new /benchmark comment cancels the previous run. After the run finishes a separate comment will be posted comparing it against the latest scheduled run on main; the PR check fails if any benchmark regresses by more than the threshold.

fm4v · 2026-05-20T09:13:00Z

Local end-to-end validation against ClickHouse 26.5.1.854

Reproduced #2361 against the stock 0.9.8 jar and confirmed the fix on this branch using the Issue2361Repro example from this PR, plus a variant that also validates row content.

Bug is real

Stock clickhouse-jdbc-0.9.8-all.jar against a server reachable on :18124, 3 iterations of the reproducer:

iter 0: trips=1 elapsed=38882ms
iter 1: trips=2 elapsed=347743ms
iter 2: trips=3 elapsed=37341ms

FINAL: trips=3 / 3 iterations (100%)

-- first failure stack (top frames) --
java.sql.SQLException: Premature end of chunk coded message body: closing chunk expected
    at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:75)
    at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:43)
    at com.clickhouse.jdbc.ResultSetImpl.close(ResultSetImpl.java:179)
    at com.clickhouse.examples.jdbc.Issue2361Repro.main(Issue2361Repro.java:92)
Caused by: com.clickhouse.client.internal.org.apache.hc.core5.http.ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:264)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:223)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:147)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:315)
    at com.clickhouse.client.internal.org.apache.hc.client5.http.impl.classic.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:133)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.io.EofSensorInputStream.checkClose(EofSensorInputStream.java:231)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.io.EofSensorInputStream.close(EofSensorInputStream.java:175)
    at com.clickhouse.client.internal.org.apache.commons.io.IOUtils.close(IOUtils.java:449)
    at com.clickhouse.client.internal.org.apache.commons.io.input.ProxyInputStream.close(ProxyInputStream.java:232)
    at com.clickhouse.client.internal.org.apache.commons.io.input.BoundedInputStream.close(BoundedInputStream.java:393)
    at com.clickhouse.client.internal.org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream.close(FramedLZ4CompressorInputStream.java:164)
    at com.clickhouse.client.api.data_formats.internal.AbstractBinaryFormatReader.close(AbstractBinaryFormatReader.java:1068)
    at com.clickhouse.jdbc.ResultSetImpl.close(ResultSetImpl.java:156)

Stack origin matches the patch site exactly: ResultSetImpl.close():179, drain path
FramedLZ4CompressorInputStream.close → BoundedInputStream.close → ChunkedInputStream.close.

Fix works

Patched jar (this branch) against a freshly-restarted server, 2 iterations of the reproducer:

iter 0: rows=65412 badNumber=3 badPad=3 tripped=false elapsed=36552ms
iter 1: rows=65412 badNumber=3 badPad=3 tripped=false elapsed=35476ms

FINAL: trips=0 rows_total=130824 badNumber=6 badPad=6

Zero Premature end of chunk propagations. try (ResultSet rs = ...) { ... } completes cleanly.

No row-content regression

I extended the reproducer to also validate each row delivered before the response is torn down:

col 0 = number — expected sequential 0,1,2,…
col 1 = repeat('xyz', 1500) — expected the 4500-char string

Side-by-side, both jars deliver byte-identical rows:

	Stock (.bak)	Patched (this PR)
`Premature end of chunk` trips	2 / 2 iters	0 / 2 iters
Rows delivered / iter	65412	65412
Index of first "bad" row	65409	65409
First bad `number` value	`7305815397312170509`	`7305815397312170509`
First bad `pad` head	`tion__\n<random>…Code:`	`tion__\n<random>…Code:`

The 3 "bad" rows per iteration are not caused by the patch — they are ClickHouse's error trailer (Code: …. DB::Exception: ParallelFormattingException___<hex>…) being inlined into the truncated chunked body when send_timeout=1 fires mid-write, then misread as RowBinary by the format reader. The exact same artifact appears with the stock jar, with identical byte counts and identical leading bytes. The patch only changes whether the cosmetic close-time exception propagates; it does not touch the row-decode path, and the empirical results confirm that.

This artifact also surfaces correctly as a mid-iter SQLException: Failed to read next row in both runs, before close() runs — so a real iteration-time problem is still propagated. The fix scope (downgrade close-time drain noise only) holds in practice.

Environment

ClickHouse server: clickhouse/clickhouse-server:head 26.5.1.854
JDK 25
JDBC connection settings: compress=true, client.use_http_compression=true, socket_timeout=60000, clickhouse_setting_send_timeout=1, clickhouse_setting_http_response_buffer_size=104857600, clickhouse_setting_wait_end_of_query=1, clickhouse_setting_max_execution_time=120
Query: SELECT number, repeat('xyz', 1500) AS pad FROM numbers(5000000)
Slow client (~50 ms sleep per 100 rows)
Driver under test built from jdbc-v2/src/main/java/com/clickhouse/jdbc/ResultSetImpl.java at the current PR head
Unit tests ResultSetImplCloseTest (6 cases — bare message, closing chunk expected, wrapped cause, class-name match without message, unrelated, null): all 6 pass

fm4v · 2026-05-20T09:15:41Z

@chernser @mzitnik hi, could you please review

When --log-each-select=true (default), the per-thread reproducer file should be self-sufficient: schema setup + bulk load + failing query in order. Three oracles were silently running DDL/data statements via SQLQueryAdapter(..., useLogger=false), so when one of them tripped an AssertionError the saved logs/clickhouse/database*.log was missing prerequisite statements and could not be replayed standalone. - CERTOracle: ANALYZE TABLE - PartitionMirrorOracle: DROP/CREATE of the mirror table plus INSERT INTO mirror SELECT * FROM source - SchemaRoundtripOracle: the round-trip CREATE pair plus cleanup DROPs Each new write is gated on state.getOptions().logEachSelect() to match the existing SQLQueryAdapter logging convention. Also document the locally-patched clickhouse-jdbc 0.9.8 jar in .claude/CLAUDE.md (upstream PR ClickHouse/clickhouse-java#2857) so a fresh checkout knows the jar shipped in target/lib/ is not stock and records the rebuild recipe + verification command.

chernser · 2026-05-20T21:11:43Z

+ * failure to a debug log instead of a thrown SQLException — close() should
+ * never punish callers for a server-side teardown race after iteration is done.
+ */
+public class Issue2361Repro {


This should be a test. Examples are only for documentation purpose.

Removed. The Issue2361Repro example has been deleted in the latest commit. The trigger recipe (compress=true, server send_timeout=1, buffered chunked response, slow client read) is now documented inline at the detection site in AbstractBinaryFormatReader.close().

chernser · 2026-05-20T21:14:19Z

+ * the application has finished iterating. Propagating it punishes well-behaved
+ * try-with-resources callers for a server-side socket race they cannot affect.
+ */
+public class ResultSetImplCloseTest {


Please squash into a fewer tests and move to ResultSetImplTest.
We plan to do another grouping.
Please mark tests included in integration group because for historical and practical reasons we do most tests against ClickHouse instance.

Squashed and moved. Since the close-time handling now lives in client-v2 (AbstractBinaryFormatReader.close) rather than jdbc-v2, the tests followed: 6 unit tests collapsed into a single data-driven test (AbstractBinaryFormatReaderCloseTest) in client-v2. Marked the unit group, matching the convention of other client-v2 unit tests like SerializerUtilsTest. The new test uses the real org.apache.hc.core5.http.ConnectionClosedException (already a client-v2 dep) instead of a stand-in.

chernser · 2026-05-20T21:15:57Z

+ * decides whether a close-time exception is a benign chunked-stream drain
+ * failure (to swallow) or a real error (to propagate).
+ *
+ * Background: see issue #2361. When the server's send_timeout fires mid-write,


This should document scenario without reference to issue in external system (issue may gone).
This documentation make more sense in close method of result set impl because we start researching from production code not tests.

Done — the test file is gone, replaced by AbstractBinaryFormatReaderCloseTest in client-v2 whose javadoc describes the scenario (server tearing down the connection before the terminating zero-length chunk is written) with no reference to an external issue tracker. The production-side WHY now lives in AbstractBinaryFormatReader.close(), which is the starting point when investigating.

chernser · 2026-05-20T21:17:48Z

+        // already torn down the connection (e.g. SOCKET_TIMEOUT on the writer side hits
+        // `send_timeout` before the terminating chunk is written). The most common
+        // surface is `ConnectionClosedException: Premature end of chunk coded message
+        // body: closing chunk expected` from Apache HC's `ChunkedInputStream` drain,


please explain in 3 lines comment.
The idea I think that there are cases where we may hide real exception what cause problematic investigation. One of such cases is premature end of chunk in combination with compression.

ResultSetImpl.close() is reverted to its original shape — the long comment is gone entirely. The drain handling moved into client-v2 (AbstractBinaryFormatReader.close), and the 6-line WHY now lives there, at the actual catch site. ResultSetImpl just propagates whatever the reader gives it, like before.

chernser · 2026-05-20T21:23:51Z

+    static boolean isStreamDrainException(Throwable t) {
+        while (t != null) {
+            String msg = t.getMessage();
+            if (msg != null && (msg.contains("Premature end of chunk")


we should detect that it is org.apache.hc.core5.http.ConnectionClosedException and log it.
we avoid building logic on error messages.

Done. The detection is now class-based — walks the cause chain and matches getClass().getName().endsWith(".ConnectionClosedException"). No message-text matching. The .endsWith (with leading dot) tolerates both the unshaded class name (org.apache.hc.core5.http.ConnectionClosedException) and the shaded one (com.clickhouse.shaded.org.apache.hc.core5.http.ConnectionClosedException) without taking a compile-time dependency on HC from the reader layer.

chernser · 2026-05-20T21:27:19Z

                } catch (Exception re) {
                    log.debug("Error closing reader", re);
-                    e = re;
+                    if (!isStreamDrainException(re)) {


this should be handled in client-v2.
It could be handled in reader but reader knows nothing about org.apache.hc.core5.http.ConnectionClosedException. There are two options

just log all exception while closing reader and report them as error or warn

add closeResponseStream to com.clickhouse.client.api.internal.HttpAPIClientHelper

Moved to client-v2. Drain handling now lives in AbstractBinaryFormatReader.close() — that is where input.close() actually triggers ChunkedInputStream.close(), so it is the natural catch site. ResultSetImpl.close() is back to its original shape with no special-case logic. The reader detects ConnectionClosedException by class-name suffix on the cause chain, so no compile-time HC dependency is added to the reader. Went with your first option (handle in reader) rather than adding closeResponseStream to HttpAPIClientHelper since the failure originates inside the reader's stream chain, not at the response level.

chernser · 2026-05-20T21:30:35Z

@fm4v

Thank you for the contribution! It looks solid.
Please response to my comment and sync with main so CI pass.

Thanks you!

PS: I will work on contributors guide this week. It should make some part clear for future contributions.

Per chernser's review on ClickHouse#2857: - Move close-time ConnectionClosedException handling from jdbc-v2 ResultSetImpl.close() into AbstractBinaryFormatReader.close() in client-v2. The drain failure originates at input.close() inside the reader (FramedLZ4CompressorInputStream -> BoundedInputStream -> ChunkedInputStream), so that is where it should be caught. - Detect the HC ConnectionClosedException by class-name suffix (cause chain) instead of message text. Works against both the unshaded and the shaded copy of the class without taking a compile-time dependency on HC from the reader layer. - Drop the long WHY comment from ResultSetImpl.close(); the rationale now lives at the actual detection site in the reader. - Squash the six jdbc-v2 unit tests into a single data-driven test in client-v2 (AbstractBinaryFormatReaderCloseTest) using the real org.apache.hc.core5.http.ConnectionClosedException. Marked unit group, matching the convention of other client-v2 unit tests (SerializerUtilsTest, etc.). - Remove the Issue2361Repro example. The reproducer recipe is captured in the commit history and the production code comment.

fm4v · 2026-05-20T22:18:03Z

@chernser thanks for the careful review. Pushed 4215ce7 addressing all six threads. Headline changes:

Moved drain handling from jdbc-v2 into client-v2 (AbstractBinaryFormatReader.close()). That is where input.close() actually triggers ChunkedInputStream.close() in the stack, so it is the natural catch site. ResultSetImpl.close() reverts to its original shape — no isStreamDrainException, no long comment.
Class-based detection, no message-text matching: walks the cause chain and matches getClass().getName().endsWith(".ConnectionClosedException"). Tolerant of both the unshaded and the shaded HC class name without adding a compile-time HC dependency to the reader.
Six unit tests → one data-driven unit test (AbstractBinaryFormatReaderCloseTest in client-v2). Uses the real org.apache.hc.core5.http.ConnectionClosedException. Marked unit group matching the convention of SerializerUtilsTest etc.
Issue2361Repro example removed; the trigger recipe is captured in commit messages and the production-code comment.

Branch is already on top of latest origin/main (no rebase needed). Local validation: client-v2 unit tests pass (4/4), client-v2 compile + jdbc-v2 test-compile + examples/jdbc compile all BUILD SUCCESS.

Replies on each individual thread point to the specific change.

chernser · 2026-05-21T17:34:50Z

+ * directly-referenced and the shaded copy of the HC class.
+ */
+@Test(groups = {"unit"})
+public class AbstractBinaryFormatReaderCloseTest {


there was example code that reproduces the issue. Would you please add test to com.clickhouse.client.HttpTransportTests.

Regarding this tests - as I've mentioned, we do not make test for specific functionality or problem. It should be inside reader test suite.

chernser · 2026-05-21T17:37:14Z

+     */
+    static boolean isConnectionClosedException(Throwable t) {
+        while (t != null) {
+            if (t.getClass().getName().endsWith(".ConnectionClosedException")) {


here we should check for class not a name.
And because class belongs to http client implementation this method should be in com.clickhouse.client.api.internal.HttpAPIClientHelper (not a great solution but we will fix it)

github-actions · 2026-06-21T01:35:35Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

fm4v added 2 commits May 20, 2026 01:14

fm4v marked this pull request as ready for review May 20, 2026 09:14

fm4v requested review from chernser and mzitnik as code owners May 20, 2026 09:14

chernser reviewed May 20, 2026

View reviewed changes

chernser reviewed May 21, 2026

View reviewed changes

chernser requested changes May 21, 2026

View reviewed changes

github-actions Bot added the stale label Jun 21, 2026

chernser removed the stale label Jun 29, 2026

Uh oh!

Conversation

fm4v commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Reproducer

Test plan

Correctness — is the data the application consumed complete?

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

fm4v commented May 20, 2026

Local end-to-end validation against ClickHouse 26.5.1.854

Bug is real

Fix works

No row-content regression

Environment

Uh oh!

fm4v commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chernser commented May 20, 2026

Uh oh!

fm4v commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fm4v commented May 20, 2026 •

edited

Loading