jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361)#2857
jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361)#2857fm4v wants to merge 2 commits into
Conversation
Fixes ClickHouse#2361. When the server tears down the response connection mid-stream (canonically because send_timeout fires on the writer side and the terminating zero-length chunk is never written), Apache HC's ChunkedInputStream.close() drains remaining bytes and trips `ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected`. With `compress=true` the close path goes through `FramedLZ4CompressorInputStream.close()` → `BoundedInputStream.close()` → `ChunkedInputStream.close()` and the same drain failure surfaces. `ResultSetImpl.close()` previously rethrew this as a `SQLException`, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in *cleanup* — they have no recourse other than wrapping every `rs.close()` (often implicit) in a try/catch. The reporter in ClickHouse#2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same. Downgrade chunked-drain failures to log-and-swallow. If iteration genuinely failed (server closed mid-`next()`), that exception still surfaces — through `next()`, not through `close()`. The bare-drain-fail case is informational only. Covers both bare `Premature end of chunk` messages and `ConnectionClosedException` class-name match for forward-compat with Apache HC rephrasings.
Augments the previous commit with: - Unit tests for `ResultSetImpl.isStreamDrainException` covering the six exception shapes the classifier needs to recognise (bare message, `closing chunk expected` variant, nested cause chain, class-name match without message, unrelated exceptions, null input). All six pass. - `examples/jdbc/.../Issue2361Repro.java` — a self-contained main that reproduces the bug against a real ClickHouse server. Recipe is documented in the class Javadoc: compression on, server `send_timeout=1`, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate. After the fix in `ResultSetImpl.close()` the iteration completes cleanly with the drain-close failure downgraded to a debug log. `isStreamDrainException` is package-private so it can be unit-tested without reflection.
|
Repository collaborators can run the JMH benchmark suite against this PR by commenting: Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%): Only one benchmark run per PR is active at a time — issuing a new |
Local end-to-end validation against ClickHouse 26.5.1.854Reproduced #2361 against the stock 0.9.8 jar and confirmed the fix on this branch using the Bug is realStock Stack origin matches the patch site exactly: Fix worksPatched jar (this branch) against a freshly-restarted server, 2 iterations of the reproducer: Zero No row-content regressionI extended the reproducer to also validate each row delivered before the response is torn down:
Side-by-side, both jars deliver byte-identical rows:
The 3 "bad" rows per iteration are not caused by the patch — they are ClickHouse's error trailer ( This artifact also surfaces correctly as a Environment
|
When --log-each-select=true (default), the per-thread reproducer file should be self-sufficient: schema setup + bulk load + failing query in order. Three oracles were silently running DDL/data statements via SQLQueryAdapter(..., useLogger=false), so when one of them tripped an AssertionError the saved logs/clickhouse/database*.log was missing prerequisite statements and could not be replayed standalone. - CERTOracle: ANALYZE TABLE - PartitionMirrorOracle: DROP/CREATE of the mirror table plus INSERT INTO mirror SELECT * FROM source - SchemaRoundtripOracle: the round-trip CREATE pair plus cleanup DROPs Each new write is gated on state.getOptions().logEachSelect() to match the existing SQLQueryAdapter logging convention. Also document the locally-patched clickhouse-jdbc 0.9.8 jar in .claude/CLAUDE.md (upstream PR ClickHouse/clickhouse-java#2857) so a fresh checkout knows the jar shipped in target/lib/ is not stock and records the rebuild recipe + verification command.
| * failure to a debug log instead of a thrown SQLException — close() should | ||
| * never punish callers for a server-side teardown race after iteration is done. | ||
| */ | ||
| public class Issue2361Repro { |
There was a problem hiding this comment.
This should be a test. Examples are only for documentation purpose.
| * the application has finished iterating. Propagating it punishes well-behaved | ||
| * try-with-resources callers for a server-side socket race they cannot affect. | ||
| */ | ||
| public class ResultSetImplCloseTest { |
There was a problem hiding this comment.
Please squash into a fewer tests and move to ResultSetImplTest.
We plan to do another grouping.
Please mark tests included in integration group because for historical and practical reasons we do most tests against ClickHouse instance.
| * decides whether a close-time exception is a benign chunked-stream drain | ||
| * failure (to swallow) or a real error (to propagate). | ||
| * | ||
| * Background: see issue #2361. When the server's send_timeout fires mid-write, |
There was a problem hiding this comment.
This should document scenario without reference to issue in external system (issue may gone).
This documentation make more sense in close method of result set impl because we start researching from production code not tests.
| // already torn down the connection (e.g. SOCKET_TIMEOUT on the writer side hits | ||
| // `send_timeout` before the terminating chunk is written). The most common | ||
| // surface is `ConnectionClosedException: Premature end of chunk coded message | ||
| // body: closing chunk expected` from Apache HC's `ChunkedInputStream` drain, |
There was a problem hiding this comment.
please explain in 3 lines comment.
The idea I think that there are cases where we may hide real exception what cause problematic investigation. One of such cases is premature end of chunk in combination with compression.
| static boolean isStreamDrainException(Throwable t) { | ||
| while (t != null) { | ||
| String msg = t.getMessage(); | ||
| if (msg != null && (msg.contains("Premature end of chunk") |
There was a problem hiding this comment.
we should detect that it is org.apache.hc.core5.http.ConnectionClosedException and log it.
we avoid building logic on error messages.
| } catch (Exception re) { | ||
| log.debug("Error closing reader", re); | ||
| e = re; | ||
| if (!isStreamDrainException(re)) { |
There was a problem hiding this comment.
this should be handled in client-v2.
It could be handled in reader but reader knows nothing about org.apache.hc.core5.http.ConnectionClosedException. There are two options
- just log all exception while closing reader and report them as error or warn
- add
closeResponseStreamtocom.clickhouse.client.api.internal.HttpAPIClientHelper
|
Thank you for the contribution! It looks solid. Thanks you! PS: I will work on contributors guide this week. It should make some part clear for future contributions. |
Summary
Fixes #2361.
ResultSetImpl.close()no longer rethrowsConnectionClosedException: Premature end of chunk coded message body: closing chunk expectedfrom the chunked-response drain path. Iteration-time failures still surface throughnext(); this only fixes the close-cleanup noise.Why
When the server tears down the response connection mid-stream (canonically because
send_timeoutfires on the writer side and the terminating zero-length chunk is never written), Apache HC'sChunkedInputStream.close()drains remaining bytes and tripsPremature end of chunk coded message body. Withcompress=truethe close path goes throughFramedLZ4CompressorInputStream.close()→BoundedInputStream.close()→ChunkedInputStream.close()and the same drain failure surfaces.ResultSetImpl.close()previously rethrew this as aSQLException, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in cleanup — they have no recourse other than wrapping everyrs.close()(often implicit) in a try/catch. The reporter in #2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same pattern.If iteration genuinely fails (server closed mid-
next()), that exception still surfaces — throughnext(), not throughclose(). The bare-drain-on-close case is informational only.What changed
jdbc-v2/src/main/java/com/clickhouse/jdbc/ResultSetImpl.java: gate the existinge = re;assignments inclose()behind a newisStreamDrainExceptionclassifier. Drain-class exceptions are logged at debug and swallowed; all other close-time failures still propagate as before. Logic shape preserved.jdbc-v2/src/test/java/com/clickhouse/jdbc/ResultSetImplCloseTest.java: six unit tests covering the classifier (bare message match,closing chunk expectedvariant, nested cause chain, class-name match without message, unrelated exceptions, null input). All pass.examples/jdbc/src/main/java/com/clickhouse/examples/jdbc/Issue2361Repro.java: a self-containedmainthat reproduces the bug against a real ClickHouse server. Trigger recipe is in the class Javadoc: compression on, serversend_timeout=1, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate.Reproducer
java -DchUrl=jdbc:ch://localhost:8123 \ com.clickhouse.examples.jdbc.Issue2361Repro 3Before this PR (jdbc-v2 0.9.8 main):
After this PR:
Test plan
mvn -pl jdbc-v2 test -Dtest=ResultSetImplCloseTest— 6/6 passisStreamDrainExceptionto other transient-cleanup exception classes (e.g. SSLSSLExceptionon close,SocketException: Connection reseton close)Correctness — is the data the application consumed complete?
Yes, in the failure scenario this PR addresses. The relevant invariants are in upstream code already, not added here:
AbstractBinaryFormatReader.readRecord(...)catchesEOFExceptiononly whenfirstColumn=true(between rows) and treats it as end-of-stream. EOF mid-row is rethrown asIOException(recordReadExceptionMsg(...), e), which surfaces throughnext()— not throughclose().ChunkedInputStream.read(...)in Apache HC returns-1only when it has parsed the terminator zero-length chunk. If the TCP connection drops mid-chunk-header it throwsConnectionClosedException: Premature end of chunk coded message body: closing chunk expected; if the chunk header is corrupt it throwsMalformedChunkCodingException. Truncation never quietly looks like clean EOF to the upper layer.Therefore the failure path covered by this PR corresponds to: the lz4-framed stream's end-of-stream marker was seen, the reader fully consumed the payload,
next()legitimately returnedfalse, and only then did the chunked-stream drain inclose()hit the missing terminator chunk. The application has the correct row set; the close-time error is an HTTP-framing artefact, not a data-integrity one.Other truncation cases (mid-row, mid-frame-header, malformed chunk) all surface through
next()beforeclose()is reached — those paths are unchanged by this PR. TheisStreamDrainExceptionclassifier is intentionally narrow (Premature end of chunk/closing chunk expectedmessage, orConnectionClosedExceptionclass match) so unrelated close-time failures still propagate as before.