Skip to content

GH-3398: Fix potential ClassLoader leak caused by ThreadLocal lambda in Binary.java #3447

Open
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:GH-3398
Open

GH-3398: Fix potential ClassLoader leak caused by ThreadLocal lambda in Binary.java #3447
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:GH-3398

Conversation

@LuciferYang
Copy link
Contributor

Rationale for this change

Fixes #3398

In FromCharSequenceBinary, a ThreadLocal<CharsetEncoder> is initialized via ThreadLocal.withInitial(StandardCharsets.UTF_8::newEncoder). The lambda generates a dynamic Supplier class loaded by the current application ClassLoader. In long-lived thread pool environments (Spark/Flink executors, web containers), pooled worker threads survive job cancellation or hot-redeployment, retaining a strong reference chain: ThreadThreadLocalMapSuppliedThreadLocal → lambda → ClassLoader. This permanently pins the application ClassLoader, preventing class unloading and causing linear Metaspace growth, eventually leading to java.lang.OutOfMemoryError: Metaspace.

What changes are included in this PR?

Replaced the ThreadLocal<CharsetEncoder> based encoding in FromCharSequenceBinary with a stateless value.toString().getBytes(StandardCharsets.UTF_8), consistent with the existing FromStringBinary.encodeUTF8() at line 251 in the same file.

Also note that the original catch (CharacterCodingException) block was effectively dead code — StandardCharsets.UTF_8 is a standard charset guaranteed to be available, so CharsetEncoder.encode() would never throw CharacterCodingException for unsupported charset reasons.

Are these changes tested?

Yes. All existing tests in parquet-column pass without modification.

Are there any user-facing changes?

No. The encoding behavior is identical — both approaches produce the same UTF-8 byte sequence.

Performance note: fromCharSequence() is only invoked as a fallback in AvroWriteSupport.fromAvroString() for non-String, non-Utf8 CharSequence implementations. The dominant write paths use fromReusedByteArray() (Avro Utf8) and fromString() (Java String), both of which already use stateless encoding without ThreadLocal. Therefore, no measurable performance regression is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Potential ClassLoader Leak: ThreadLocal.withInitial lambda in Binary.java pins ClassLoader causing Metaspace OOM

1 participant