Skip to content

jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361)#2857

Open
fm4v wants to merge 2 commits into
ClickHouse:mainfrom
fm4v:fix/resultset-close-swallow-stream-errors
Open

jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361)#2857
fm4v wants to merge 2 commits into
ClickHouse:mainfrom
fm4v:fix/resultset-close-swallow-stream-errors

Conversation

@fm4v
Copy link
Copy Markdown
Member

@fm4v fm4v commented May 20, 2026

Summary

Fixes #2361. ResultSetImpl.close() no longer rethrows ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected from the chunked-response drain path. Iteration-time failures still surface through next(); this only fixes the close-cleanup noise.

Why

When the server tears down the response connection mid-stream (canonically because send_timeout fires on the writer side and the terminating zero-length chunk is never written), Apache HC's ChunkedInputStream.close() drains remaining bytes and trips Premature end of chunk coded message body. With compress=true the close path goes through FramedLZ4CompressorInputStream.close()BoundedInputStream.close()ChunkedInputStream.close() and the same drain failure surfaces.

ResultSetImpl.close() previously rethrew this as a SQLException, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in cleanup — they have no recourse other than wrapping every rs.close() (often implicit) in a try/catch. The reporter in #2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same pattern.

If iteration genuinely fails (server closed mid-next()), that exception still surfaces — through next(), not through close(). The bare-drain-on-close case is informational only.

What changed

  1. jdbc-v2/src/main/java/com/clickhouse/jdbc/ResultSetImpl.java: gate the existing e = re; assignments in close() behind a new isStreamDrainException classifier. Drain-class exceptions are logged at debug and swallowed; all other close-time failures still propagate as before. Logic shape preserved.

  2. jdbc-v2/src/test/java/com/clickhouse/jdbc/ResultSetImplCloseTest.java: six unit tests covering the classifier (bare message match, closing chunk expected variant, nested cause chain, class-name match without message, unrelated exceptions, null input). All pass.

  3. examples/jdbc/src/main/java/com/clickhouse/examples/jdbc/Issue2361Repro.java: a self-contained main that reproduces the bug against a real ClickHouse server. Trigger recipe is in the class Javadoc: compression on, server send_timeout=1, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate.

Reproducer

java -DchUrl=jdbc:ch://localhost:8123 \
     com.clickhouse.examples.jdbc.Issue2361Repro 3

Before this PR (jdbc-v2 0.9.8 main):

iter 0: trips=1 elapsed=126840ms
iter 1: trips=2 elapsed=131245ms
...
FINAL: trips=2 / 3 iterations (67%)

After this PR:

iter 0: trips=0 elapsed=...
iter 1: trips=0 elapsed=...
FINAL: trips=0 / 3 iterations (0%)

Test plan

  • mvn -pl jdbc-v2 test -Dtest=ResultSetImplCloseTest — 6/6 pass
  • Standalone reproducer demonstrates 100% → 0% trip rate before vs after
  • Maintainer review for whether to also extend isStreamDrainException to other transient-cleanup exception classes (e.g. SSL SSLException on close, SocketException: Connection reset on close)

Correctness — is the data the application consumed complete?

Yes, in the failure scenario this PR addresses. The relevant invariants are in upstream code already, not added here:

  1. AbstractBinaryFormatReader.readRecord(...) catches EOFException only when firstColumn=true (between rows) and treats it as end-of-stream. EOF mid-row is rethrown as IOException(recordReadExceptionMsg(...), e), which surfaces through next() — not through close().

  2. ChunkedInputStream.read(...) in Apache HC returns -1 only when it has parsed the terminator zero-length chunk. If the TCP connection drops mid-chunk-header it throws ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected; if the chunk header is corrupt it throws MalformedChunkCodingException. Truncation never quietly looks like clean EOF to the upper layer.

  3. Therefore the failure path covered by this PR corresponds to: the lz4-framed stream's end-of-stream marker was seen, the reader fully consumed the payload, next() legitimately returned false, and only then did the chunked-stream drain in close() hit the missing terminator chunk. The application has the correct row set; the close-time error is an HTTP-framing artefact, not a data-integrity one.

Other truncation cases (mid-row, mid-frame-header, malformed chunk) all surface through next() before close() is reached — those paths are unchanged by this PR. The isStreamDrainException classifier is intentionally narrow (Premature end of chunk / closing chunk expected message, or ConnectionClosedException class match) so unrelated close-time failures still propagate as before.

fm4v added 2 commits May 20, 2026 01:14
Fixes ClickHouse#2361. When the server tears down the response connection mid-stream
(canonically because send_timeout fires on the writer side and the
terminating zero-length chunk is never written), Apache HC's
ChunkedInputStream.close() drains remaining bytes and trips
`ConnectionClosedException: Premature end of chunk coded message body:
closing chunk expected`. With `compress=true` the close path goes through
`FramedLZ4CompressorInputStream.close()` → `BoundedInputStream.close()` →
`ChunkedInputStream.close()` and the same drain failure surfaces.

`ResultSetImpl.close()` previously rethrew this as a `SQLException`, which
breaks well-behaved try-with-resources callers: they have already finished
iterating, they cannot affect the server-side socket race, and the bug is
surfacing in *cleanup* — they have no recourse other than wrapping every
`rs.close()` (often implicit) in a try/catch. The reporter in ClickHouse#2361 ended
up implementing client-side retry for the symptom; sqlancer / SQL fuzzers
hit the same.

Downgrade chunked-drain failures to log-and-swallow. If iteration genuinely
failed (server closed mid-`next()`), that exception still surfaces — through
`next()`, not through `close()`. The bare-drain-fail case is informational
only.

Covers both bare `Premature end of chunk` messages and `ConnectionClosedException`
class-name match for forward-compat with Apache HC rephrasings.
Augments the previous commit with:

- Unit tests for `ResultSetImpl.isStreamDrainException` covering the
  six exception shapes the classifier needs to recognise (bare message,
  `closing chunk expected` variant, nested cause chain, class-name
  match without message, unrelated exceptions, null input). All six
  pass.

- `examples/jdbc/.../Issue2361Repro.java` — a self-contained main
  that reproduces the bug against a real ClickHouse server. Recipe is
  documented in the class Javadoc: compression on, server `send_timeout=1`,
  buffered chunked response, slow client read. On a freshly-started
  server it trips at ~100% rate. After the fix in
  `ResultSetImpl.close()` the iteration completes cleanly with the
  drain-close failure downgraded to a debug log.

`isStreamDrainException` is package-private so it can be unit-tested
without reflection.
@github-actions
Copy link
Copy Markdown

Repository collaborators can run the JMH benchmark suite against this PR by commenting:

/benchmark

Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%):

/benchmark threshold=15

Only one benchmark run per PR is active at a time — issuing a new /benchmark comment cancels the previous run. After the run finishes a separate comment will be posted comparing it against the latest scheduled run on main; the PR check fails if any benchmark regresses by more than the threshold.

@fm4v
Copy link
Copy Markdown
Member Author

fm4v commented May 20, 2026

Local end-to-end validation against ClickHouse 26.5.1.854

Reproduced #2361 against the stock 0.9.8 jar and confirmed the fix on this branch using the Issue2361Repro example from this PR, plus a variant that also validates row content.

Bug is real

Stock clickhouse-jdbc-0.9.8-all.jar against a server reachable on :18124, 3 iterations of the reproducer:

iter 0: trips=1 elapsed=38882ms
iter 1: trips=2 elapsed=347743ms
iter 2: trips=3 elapsed=37341ms

FINAL: trips=3 / 3 iterations (100%)

-- first failure stack (top frames) --
java.sql.SQLException: Premature end of chunk coded message body: closing chunk expected
    at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:75)
    at com.clickhouse.jdbc.internal.ExceptionUtils.toSqlState(ExceptionUtils.java:43)
    at com.clickhouse.jdbc.ResultSetImpl.close(ResultSetImpl.java:179)
    at com.clickhouse.examples.jdbc.Issue2361Repro.main(Issue2361Repro.java:92)
Caused by: com.clickhouse.client.internal.org.apache.hc.core5.http.ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:264)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:223)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:147)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:315)
    at com.clickhouse.client.internal.org.apache.hc.client5.http.impl.classic.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:133)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.io.EofSensorInputStream.checkClose(EofSensorInputStream.java:231)
    at com.clickhouse.client.internal.org.apache.hc.core5.http.io.EofSensorInputStream.close(EofSensorInputStream.java:175)
    at com.clickhouse.client.internal.org.apache.commons.io.IOUtils.close(IOUtils.java:449)
    at com.clickhouse.client.internal.org.apache.commons.io.input.ProxyInputStream.close(ProxyInputStream.java:232)
    at com.clickhouse.client.internal.org.apache.commons.io.input.BoundedInputStream.close(BoundedInputStream.java:393)
    at com.clickhouse.client.internal.org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream.close(FramedLZ4CompressorInputStream.java:164)
    at com.clickhouse.client.api.data_formats.internal.AbstractBinaryFormatReader.close(AbstractBinaryFormatReader.java:1068)
    at com.clickhouse.jdbc.ResultSetImpl.close(ResultSetImpl.java:156)

Stack origin matches the patch site exactly: ResultSetImpl.close():179, drain path
FramedLZ4CompressorInputStream.closeBoundedInputStream.closeChunkedInputStream.close.

Fix works

Patched jar (this branch) against a freshly-restarted server, 2 iterations of the reproducer:

iter 0: rows=65412 badNumber=3 badPad=3 tripped=false elapsed=36552ms
iter 1: rows=65412 badNumber=3 badPad=3 tripped=false elapsed=35476ms

FINAL: trips=0 rows_total=130824 badNumber=6 badPad=6

Zero Premature end of chunk propagations. try (ResultSet rs = ...) { ... } completes cleanly.

No row-content regression

I extended the reproducer to also validate each row delivered before the response is torn down:

  • col 0 = number — expected sequential 0,1,2,…
  • col 1 = repeat('xyz', 1500) — expected the 4500-char string

Side-by-side, both jars deliver byte-identical rows:

Stock (.bak) Patched (this PR)
Premature end of chunk trips 2 / 2 iters 0 / 2 iters
Rows delivered / iter 65412 65412
Index of first "bad" row 65409 65409
First bad number value 7305815397312170509 7305815397312170509
First bad pad head tion__\n<random>…Code: tion__\n<random>…Code:

The 3 "bad" rows per iteration are not caused by the patch — they are ClickHouse's error trailer (Code: …. DB::Exception: ParallelFormattingException___<hex>…) being inlined into the truncated chunked body when send_timeout=1 fires mid-write, then misread as RowBinary by the format reader. The exact same artifact appears with the stock jar, with identical byte counts and identical leading bytes. The patch only changes whether the cosmetic close-time exception propagates; it does not touch the row-decode path, and the empirical results confirm that.

This artifact also surfaces correctly as a mid-iter SQLException: Failed to read next row in both runs, before close() runs — so a real iteration-time problem is still propagated. The fix scope (downgrade close-time drain noise only) holds in practice.

Environment

  • ClickHouse server: clickhouse/clickhouse-server:head 26.5.1.854
  • JDK 25
  • JDBC connection settings: compress=true, client.use_http_compression=true, socket_timeout=60000, clickhouse_setting_send_timeout=1, clickhouse_setting_http_response_buffer_size=104857600, clickhouse_setting_wait_end_of_query=1, clickhouse_setting_max_execution_time=120
  • Query: SELECT number, repeat('xyz', 1500) AS pad FROM numbers(5000000)
  • Slow client (~50 ms sleep per 100 rows)
  • Driver under test built from jdbc-v2/src/main/java/com/clickhouse/jdbc/ResultSetImpl.java at the current PR head
  • Unit tests ResultSetImplCloseTest (6 cases — bare message, closing chunk expected, wrapped cause, class-name match without message, unrelated, null): all 6 pass

@fm4v fm4v marked this pull request as ready for review May 20, 2026 09:14
@fm4v fm4v requested review from chernser and mzitnik as code owners May 20, 2026 09:14
@fm4v
Copy link
Copy Markdown
Member Author

fm4v commented May 20, 2026

@chernser @mzitnik hi, could you please review

fm4v added a commit to fm4v/sqlancer that referenced this pull request May 20, 2026
When --log-each-select=true (default), the per-thread reproducer file
should be self-sufficient: schema setup + bulk load + failing query in
order. Three oracles were silently running DDL/data statements via
SQLQueryAdapter(..., useLogger=false), so when one of them tripped an
AssertionError the saved logs/clickhouse/database*.log was missing
prerequisite statements and could not be replayed standalone.

- CERTOracle: ANALYZE TABLE
- PartitionMirrorOracle: DROP/CREATE of the mirror table plus
  INSERT INTO mirror SELECT * FROM source
- SchemaRoundtripOracle: the round-trip CREATE pair plus cleanup DROPs

Each new write is gated on state.getOptions().logEachSelect() to match
the existing SQLQueryAdapter logging convention.

Also document the locally-patched clickhouse-jdbc 0.9.8 jar in
.claude/CLAUDE.md (upstream PR ClickHouse/clickhouse-java#2857) so a
fresh checkout knows the jar shipped in target/lib/ is not stock and
records the rebuild recipe + verification command.
* failure to a debug log instead of a thrown SQLException — close() should
* never punish callers for a server-side teardown race after iteration is done.
*/
public class Issue2361Repro {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a test. Examples are only for documentation purpose.

* the application has finished iterating. Propagating it punishes well-behaved
* try-with-resources callers for a server-side socket race they cannot affect.
*/
public class ResultSetImplCloseTest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash into a fewer tests and move to ResultSetImplTest.
We plan to do another grouping.
Please mark tests included in integration group because for historical and practical reasons we do most tests against ClickHouse instance.

* decides whether a close-time exception is a benign chunked-stream drain
* failure (to swallow) or a real error (to propagate).
*
* Background: see issue #2361. When the server's send_timeout fires mid-write,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should document scenario without reference to issue in external system (issue may gone).
This documentation make more sense in close method of result set impl because we start researching from production code not tests.

// already torn down the connection (e.g. SOCKET_TIMEOUT on the writer side hits
// `send_timeout` before the terminating chunk is written). The most common
// surface is `ConnectionClosedException: Premature end of chunk coded message
// body: closing chunk expected` from Apache HC's `ChunkedInputStream` drain,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain in 3 lines comment.
The idea I think that there are cases where we may hide real exception what cause problematic investigation. One of such cases is premature end of chunk in combination with compression.

static boolean isStreamDrainException(Throwable t) {
while (t != null) {
String msg = t.getMessage();
if (msg != null && (msg.contains("Premature end of chunk")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should detect that it is org.apache.hc.core5.http.ConnectionClosedException and log it.
we avoid building logic on error messages.

} catch (Exception re) {
log.debug("Error closing reader", re);
e = re;
if (!isStreamDrainException(re)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be handled in client-v2.
It could be handled in reader but reader knows nothing about org.apache.hc.core5.http.ConnectionClosedException. There are two options

  • just log all exception while closing reader and report them as error or warn
  • add closeResponseStream to com.clickhouse.client.api.internal.HttpAPIClientHelper

@chernser
Copy link
Copy Markdown
Contributor

@fm4v

Thank you for the contribution! It looks solid.
Please response to my comment and sync with main so CI pass.

Thanks you!

PS: I will work on contributors guide this week. It should make some part clear for future contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random exception while executing queries via JDBC driver

2 participants