[improve][build] Restore fastutil and minimize it in the client and server distributions#26032
Open
lhotari wants to merge 10 commits into
Open
[improve][build] Restore fastutil and minimize it in the client and server distributions#26032lhotari wants to merge 10 commits into
lhotari wants to merge 10 commits into
Conversation
… (revert apache#25413) ### Motivation Reverts apache#25413, which had replaced fastutil with hand-written primitive collections (Int2ObjectOpenHashMap, IntIntPair, Long2ObjectOpenHashMap, ...). Those custom collections are a maintenance overhead for the project and have been a source of bugs in the past, so the broker and client code goes back to using fastutil. ### Modifications - Restore fastutil usage in NegativeAcksTracker, PendingAcksMap, Consumer, DrainingHashesTracker, InMemoryRedeliveryTracker, InMemoryDelayedDeliveryTracker and PersistentStickyKeyDispatcherMultipleConsumers (and their tests). - Keep the post-apache#25413 improvements: Roaring bitmap usage, the delayed-delivery race fixes and the PendingAcksMap O(1) size optimization (apache#26019); only the collection types are swapped back, preferring fastutil primitive sorted maps (Long2ObjectRBTreeMap/AVLTreeMap) over java.util.TreeMap. - Delete the custom org.apache.pulsar.common.util.collections.* classes and tests. - Re-add the fastutil dependency to pulsar-broker and pulsar-client, the version catalog, and the binary LICENSE files. Assisted-by: Claude Code
…nt jars
### Motivation
The full fastutil jar is ~25MB / ~12,965 classes, of which the Pulsar client only
uses a handful (via NegativeAcksTracker). Bundling all of it into the shaded client
jars is wasteful. The Maven build (branch-4.2) avoids this with a
`pulsar-client-dependencies-minimized` module driven by maven-shade-plugin's
minimizeJar; this is the Gradle equivalent.
### Modifications
- Reimplement pulsar-client-dependencies-minimized/build.gradle.kts on the GradleUp
Shadow plugin. It declares pulsar-client-original with the `api` scope so Shadow's
minimize() seeds reachability from the whole client closure, bundles only the
libraries listed in `minimizedDependencies` ("group:name" entries, currently just
fastutil), and prunes everything unreachable -> 591 fastutil classes (matching the
Maven minimizeJar output, down from ~12,965).
- Add a verifyMinimizedJar check that fails the build if a required class is pruned
or if minimize() silently becomes a no-op.
- Strip the build-only `api` seed from the module's outgoing variants so it exposes
no transitive dependencies to consumers (self-contained fastutil-only jar).
- Wire it in: settings include, relocate it.unimi.dsi.fastutil in the client shade
conventions, and have pulsar-client-shaded / -all / -admin-shaded exclude fastutil
from pulsar-client(-admin)-original and bundle the minimized module instead.
Assisted-by: Claude Code
Routine upgrade of the com.gradleup.shadow plugin used to build the shaded jars (including the new pulsar-client-dependencies-minimized minimization). Assisted-by: Claude Code
…r-client-fastutil-minimized ### Motivation The module exists solely to minimize the bundled fastutil classes, so the more specific name makes its purpose clear (and matches the "minimize this specific library" design driven by the `minimizedDependencies` list). ### Modifications - Rename the module directory and Gradle project to pulsar-client-fastutil-minimized (directory and project path match, so no module-name-vs-directory gotcha), and update the references in settings.gradle.kts, the client shade conventions and the three shaded client build files. - Simplify verifyMinimizedJar to a single class-count guard (fail if the jar retains more than 600 classes — actual is 591), dropping the per-class required-list check. The task stays configuration-cache compatible (captures only Providers/values, no Project access in the action). Assisted-by: Claude Code
…ions plugin
### Motivation
The fastutil-minimization setup (shadow minimize() seeded from `api` roots, bundle-only
filter, stripped outgoing variants, class-count verification) was inline in
pulsar-client-fastutil-minimized. A second minimization module is coming
(pulsar-broker-fastutil-minimized), so the shared machinery moves into a convention plugin.
### Modifications
- Add build-logic `pulsar.fastutil-minimized-conventions` precompiled script plugin and a
`FastutilMinimizedExtension`. A consuming module declares its reachability roots as
`api(project(...))` dependencies and sets `fastutilMinimized { maxRetainedClasses.set(N) }`;
the plugin handles minimize()/include/extendsFrom-stripping and the verifyMinimizedJar check
(configuration-cache compatible — the task action captures only Providers).
- Reduce pulsar-client-fastutil-minimized/build.gradle.kts to apply the plugin, declare the
pulsar-client-original root, and set the 600-class limit.
Assisted-by: Claude Code
…stead of the full jar ### Motivation One argument for dropping fastutil (apache#25413) was that the full ~25MB jar enlarges the server distribution / docker image. Shipping a minimized fastutil that contains only the classes actually used on the server (and the bundled, unrelocated client) side resolves that, while keeping the convenience that any Pulsar code can pick a new fastutil collection and the build automatically pulls in the classes it needs. ### Modifications - Add pulsar-broker-fastutil-minimized (uses pulsar.fastutil-minimized-conventions) with pulsar-broker and pulsar-client-original as reachability roots — a superset of the client minimized set (~818 classes vs ~591). - In the server distribution, exclude the full it.unimi.dsi:fastutil jar from distLib and bundle pulsar-broker-fastutil-minimized instead. The client-only pulsar-client-fastutil-minimized is not pulled into the server distribution. - Drop the now-stale fastutil entry from the server binary LICENSE (checkBinaryLicense passes; the minimized classes ship inside a Pulsar-owned jar). Assisted-by: Claude Code
…il in its POM/GMM ### Motivation Maven/Gradle consumers of the (non-shaded) pulsar-client-original currently pull the full ~25MB fastutil jar, even though the client only uses a few fastutil classes. Replacing that dependency with pulsar-client-fastutil-minimized in the published metadata lets consumers get just the classes the client needs. ### Modifications - Publish pulsar-client-fastutil-minimized (add pulsar.publish-conventions). This also makes it a published dependency, satisfying the public-java-library "published modules only depend on published modules" check for the shaded client modules that bundle it. The broker variant stays unpublished (only the server distribution consumes it). - In pulsar-client (pulsar-client-original), rewrite the PUBLISHED POM (pom.withXml) and Gradle Module Metadata (post-process the generated .module JSON) to replace it.unimi.dsi:fastutil with org.apache.pulsar:pulsar-client-fastutil-minimized. - This is publication-only: intra-build, pulsar-client-original keeps exposing full fastutil (pulsar-broker depends on it and uses more fastutil classes than the client minimized set), and there is no build-graph dependency on the minimized module — it is built only when the shaded jars (or its own publication) are, not on every client code change. Assisted-by: Claude Code
The convention is not specific to fastutil — it minimizes any configured library. Rename
pulsar.fastutil-minimized-conventions to pulsar.minimized-dependencies-conventions and
FastutilMinimizedExtension to MinimizedDependenciesExtension, expose it as `minimizedJar { }`,
and drop the fastutil-specific default so each module lists its own `minimizedDependencies`.
The fastutil modules now set `minimizedJar { minimizedDependencies.set(listOf("it.unimi.dsi:fastutil")); ... }`.
Assisted-by: Claude Code
The full fastutil jar was replaced in the server distribution by the minimized pulsar-broker-fastutil-minimized jar, so the previous "Fastutil -- ...jar" line (which named a no-longer-bundled jar) was removed. The fastutil classes (Apache-2.0) still ship, bundled inside that Pulsar jar, so restore a fastutil attribution as free text without a jar filename — accepted by checkBinaryLicense, which only validates entries that reference a concrete *.jar. Assisted-by: Claude Code
### Motivation The server distribution was switched to the minimized fastutil jar, but the pulsar-shell CLI distribution still bundled the full ~24MB fastutil (pulled transitively via pulsar-client-tools), which is inconsistent and inflates the shell tarball / image. ### Modifications - distribution/shell: exclude the full it.unimi.dsi:fastutil from distLib and bundle pulsar-client-fastutil-minimized instead (the unrelocated minimized client set works for the shell's client-side modules). The shell binary LICENSE keeps a free-text fastutil attribution without a jar filename, matching the server distribution. - Broaden pulsar-client-fastutil-minimized's reachability roots to pulsar-client-tools and pulsar-client-admin-original in addition to pulsar-client-original, so the minimized set covers every fastutil class any client-side module reaches. There is no such usage today (the retained set stays 591 classes), but this future-proofs the set: if these modules start using fastutil, the needed classes are pulled in automatically. Verified the set is self-contained (0 closure gaps) and covers all 10 fastutil references in pulsar-client-original. Assisted-by: Claude Code
This was referenced Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #24996
Reverts #25413 and addresses the review feedback in #26028.
Motivation
#25413 removed the
fastutildependency and replaced it with hand-written primitivecollections under
org.apache.pulsar.common.util.collections(Int2ObjectOpenHashMap,IntIntPair,Long2ObjectOpenHashMap, and similar). That change was driven by a realproblem — the full
fastutiljar is ~24 MB / ~12,965 classes and we don't want to shipall of it — but the conversion of the Maven build's fastutil minification to Gradle was
the actual blocker, not fastutil itself.
As discussed in the #26028 review, hand-rolled primitive collections are the wrong
trade-off: custom collection implementations have repeatedly been a source of bugs and are
an ongoing maintenance burden, and there is a large combinatorial space of primitive
collection types we'd have to reimplement and maintain. fastutil is a mature library whose
collections have been optimized for performance for many years, and when a new use case
needs another primitive collection, fastutil already has it — it can be used directly
without adding more custom classes to the Pulsar code base.
Memory efficiency is a concrete reason this matters. The Key_Shared draining-hashes design
in PIP-379 uses fastutil
primitive collections specifically to keep the per-subscription tracking footprint small,
and its memory model is computed directly on those types:
Int2ObjectOpenHashMap<DrainingHashEntry>,using primitive
intkeys to avoid boxedIntegerkeys.Long2ObjectSortedMap<IntIntPair>backed by aLong2ObjectRBTreeMap, withIntIntPair.of(batchSize, stickyKeyHash)as the value —primitive
longkeys and a primitive int-pair value instead of boxed wrappers.Int2ObjectOpenHashMapoverhead; bounded worst case, ~0 once hashes drain) is derivedfrom these fastutil maps.
Reimplementing these collections by hand undercuts both the correctness and the memory
characteristics that design relies on, so there is a memory-usage regression unless
fastutil is restored.
Finally, this resolves #24996: the full ~24 MB fastutil inflates deployments that use the
non-shaded
-originalclients (e.g. exceeding AWS Lambda's unpacked-size limit). With thischange, consumers of
pulsar-client-originalget a ~1.3 MB minimized fastutil jar insteadof the full one.
Modifications
Restore fastutil (revert of [cleanup] Remove fastutil dependency #25413). Switch the broker and client back to fastutil
in
NegativeAcksTracker,PendingAcksMap,Consumer,DrainingHashesTracker,InMemoryRedeliveryTracker,InMemoryDelayedDeliveryTracker, andPersistentStickyKeyDispatcherMultipleConsumers(and their tests), preferring fastutilprimitive sorted maps (
Long2ObjectRBTreeMap/Long2ObjectAVLTreeMap) overjava.util.TreeMap. Post-[cleanup] Remove fastutil dependency #25413 improvements are kept (Roaring bitmap usage, thedelayed-delivery race fixes, and the
PendingAcksMapO(1)sizeoptimization from[improve][broker] Avoid O(n) pending ack scans for size lookup #26019) — only the collection types are swapped back. Delete the hand-written
org.apache.pulsar.common.util.collections.*classes and tests. Re-addfastutil(
8.5.18) topulsar-broker,pulsar-client, and the version catalog.Reusable
pulsar.minimized-dependencies-conventionsplugin. A library-agnosticbuild-logic convention (+
MinimizedDependenciesExtension, exposed asminimizedJar { }).A packaging module applies it, declares reachability roots as
api(project(...))dependencies, and sets
minimizedDependencies("group:name"list) +maxRetainedClasses.It builds a Shadow jar bundling only the configured libraries, runs
minimize()(seededfrom the
apiroots, since Shadow'sUnusedTrackeruses the project's own +api-scopedclasses and these modules have no source), strips the build-only
apiseed from theapiElements/runtimeElementsvariants, and adds a config-cache-compatibleverifyMinimizedJarcheck (wired intocheck) that fails if the retained class countexceeds the limit.
Two minimized modules.
pulsar-client-fastutil-minimized(roots:pulsar-client-original,pulsar-client-tools,pulsar-client-admin-original; ~591 classes) andpulsar-broker-fastutil-minimized(roots:pulsar-broker+pulsar-client-original,~818 classes — a superset of the client set).
Client shaded jars (
pulsar-client-shaded,-all,-admin-shaded) excludeit.unimi.dsi:fastutilfrompulsar-client(-admin)-originaland bundlepulsar-client-fastutil-minimized; the client shade conventions relocateit.unimi.dsi.fastutilunder the shade prefix.Server and shell distributions exclude the full fastutil jar from their
distLibandship a minimized one instead: the server bundles the (unrelocated)
pulsar-broker-fastutil-minimized, and the pulsar-shell CLI bundlespulsar-client-fastutil-minimized. Both binary LICENSE files carry a free-text fastutil(Apache-2.0) attribution without a jar filename — accepted by
checkBinaryLicense, sincethe classes now ship inside a Pulsar-owned jar.
pulsar-client-originalpublished metadata (publication-only) rewrites the POM(
pom.withXml) and Gradle Module Metadata (post-processing the generated.moduleJSON)to replace the
it.unimi.dsi:fastutildependency withorg.apache.pulsar:pulsar-client-fastutil-minimized. There is no build-graph dependencyon the minimized module — intra-build,
pulsar-client-originalstill exposes full fastutil(the broker uses more classes than the client set).
pulsar-client-fastutil-minimizedispublished; the broker variant is not.
Version bumps. fastutil
8.5.18; GradleUp Shadow9.4.1→9.4.2.Resulting sizes (built and measured):
pulsar-client-fastutil-minimizedpulsar-broker-fastutil-minimizedfastutil-8.5.18.jar≈17× smaller (client) and ≈14× smaller (broker), keeping exactly the fastutil classes the
broker and client actually reach.
Verifying this change
This change is already covered by existing broker and client unit and integration tests —
the collection types are swapped back to fastutil while behavior (and the kept post-#25413
improvements) is unchanged, so the Key_Shared, negative-ack, redelivery, delayed-delivery,
and pending-ack suites exercise it. The build-logic additions are guarded by the
verifyMinimizedJarcheck and bycheckBinaryLicensefor the distribution LICENSE changes.CI passes (build, unit, and integration suites, including the shaded-jar and distribution
assembly steps).
Does this pull request potentially affect one of the following parts: