Skip to content

[PERF] Reduce memory allocations through pre-sizing collections and batch allocation#5020

Merged
Spaceman1701 merged 9 commits intoprometheus:mainfrom
Mwea:perf/pre-size-maps
Mar 10, 2026
Merged

[PERF] Reduce memory allocations through pre-sizing collections and batch allocation#5020
Spaceman1701 merged 9 commits intoprometheus:mainfrom
Mwea:perf/pre-size-maps

Conversation

@Mwea
Copy link
Copy Markdown
Contributor

@Mwea Mwea commented Feb 19, 2026

Pull Request Checklist

Please check all the applicable boxes.

  • Please list all open issue(s) discussed with maintainers related to this change
    • N/A - General performance improvements
  • Is this a new Receiver integration?
    • N/A
  • Is this a bugfix?
    • N/A
  • Is this a new feature?
    • N/A
  • Does this change affect performance?
    • I have provided benchmarks comparison that shows performance is improved or is not degraded
      • You can use benchstat to compare benchmarks
    • I have added new benchmarks if required or requested by maintainers
  • Is this a breaking change?
    • My changes do not break the existing cluster messages
    • My changes do not break the existing api
  • I have added/updated the required documentation
  • I have signed-off my commits
  • I will follow best practices for contributing to this project

Which user-facing changes does this PR introduce?

[PERF] Reduce memory allocations in alert processing hot paths through pre-sizing collections and batch allocation.


Summary

This PR reduces memory allocations in several hot paths by pre-sizing maps/slices and batching allocations. All changes are internal optimizations with no API changes.


Benchmarks

Batch allocate alert copies in flush:
10 alerts: Before: 691 ns/op, 12 allocs → After: 435 ns/op, 3 allocs (37% faster, 75% fewer allocs)
50 alerts: Before: 3003 ns/op, 52 allocs → After: 1714 ns/op, 3 allocs (43% faster, 94% fewer allocs)
100 alerts: Before: 5976 ns/op, 102 allocs → After: 3383 ns/op, 3 allocs (43% faster, 97% fewer allocs)

DedupStage alert partitioning:
50 alerts: Before: 15.8 µs/op, 126 allocs → After: 13.1 µs/op, 110 allocs (17% faster, 13% fewer allocs)
100 alerts: Before: 29.0 µs/op, 232 allocs → After: 25.7 µs/op, 210 allocs (11% faster, 9% fewer allocs)

fingerprintEquals pre-sizing:
3 labels: Before: 357 ns/op → After: 314 ns/op (12% faster)
10 labels: Before: 1519 ns/op → After: 1365 ns/op (10% faster)

getGroupLabels pre-sizing:
group_by_all (10 labels): Before: 714 ns/op → After: 585 ns/op (18% faster)

Template.Data caching:
50 alerts: 4416 B/op vs 8832 B/op (-50% memory), 51 vs 102 allocs (-50%)

@Mwea Mwea changed the title [PERF] Reduce memory allocations in alert through pre-sizing collections and batch allocation [PERF] Reduce memory allocations through pre-sizing collections and batch allocation Feb 19, 2026
@Mwea Mwea force-pushed the perf/pre-size-maps branch from 421f42c to 83ccd8d Compare February 20, 2026 07:24
@Mwea
Copy link
Copy Markdown
Contributor Author

Mwea commented Feb 20, 2026

I believe the added benchmarks for every single commit are probably overkill, but I've left them voluntarily
available for the first round of review. Happy to squash them into fewer commits or remove any that seem
useless once agreed on the overall approach.

Let me know if you'd prefer:

  • A single squashed commit with all the pre-sizing optimizations
  • Removing benchmarks that measuring obvious improvements

Copy link
Copy Markdown
Contributor

@siavashs siavashs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
There a some formatting issues failing in the CI checks.

@Mwea Mwea force-pushed the perf/pre-size-maps branch from 83ccd8d to 33ece32 Compare February 20, 2026 11:34
@Mwea
Copy link
Copy Markdown
Contributor Author

Mwea commented Feb 20, 2026

Thank you! There a some formatting issues failing in the CI checks.

Fixed it, relica of the refactor 👍🏼

@siavashs
Copy link
Copy Markdown
Contributor

Thank you! There a some formatting issues failing in the CI checks.

Fixed it, relica of the refactor 👍🏼

There are more, please rebase your branch (we just merged an updated golangci-lint 2.10) and run golangci-lint locally to make sure all issues are fixed.

@Mwea Mwea force-pushed the perf/pre-size-maps branch from 33ece32 to e196685 Compare February 20, 2026 12:01
@Mwea
Copy link
Copy Markdown
Contributor Author

Mwea commented Feb 20, 2026

Thank you! There a some formatting issues failing in the CI checks.

Fixed it, relica of the refactor 👍🏼

There are more, please rebase your branch (we just merged an updated golangci-lint 2.10) and run golangci-lint locally to make sure all issues are fixed.

My bad ! Should be all good now:

 (git)-[perf/pre-size-maps]-   $   make build && make test
>> building binaries
/home/titouan.chary/go/bin/promu build --prefix /home/titouan.chary/Projects/alertmanager 
 >   alertmanager
 >   amtool
>> running all tests
go test -race  ./...
?       github.com/prometheus/alertmanager/api  [no test files]
?       github.com/prometheus/alertmanager/api/metrics  [no test files]
ok      github.com/prometheus/alertmanager/api/v2       (cached)
?       github.com/prometheus/alertmanager/api/v2/client        [no test files]
?       github.com/prometheus/alertmanager/api/v2/client/alert  [no test files]
?       github.com/prometheus/alertmanager/api/v2/client/alertgroup     [no test files]
?       github.com/prometheus/alertmanager/api/v2/client/general        [no test files]
?       github.com/prometheus/alertmanager/api/v2/client/receiver       [no test files]
?       github.com/prometheus/alertmanager/api/v2/client/silence        [no test files]
?       github.com/prometheus/alertmanager/api/v2/models        [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi       [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi/operations    [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi/operations/alert      [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi/operations/alertgroup [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi/operations/general    [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi/operations/receiver   [no test files]
?       github.com/prometheus/alertmanager/api/v2/restapi/operations/silence    [no test files]
?       github.com/prometheus/alertmanager/asset        [no test files]
ok      github.com/prometheus/alertmanager/cli  (cached)
ok      github.com/prometheus/alertmanager/cli/config   (cached)
?       github.com/prometheus/alertmanager/cli/format   [no test files]
ok      github.com/prometheus/alertmanager/cluster      (cached)
?       github.com/prometheus/alertmanager/cluster/clusterpb    [no test files]
ok      github.com/prometheus/alertmanager/cmd/alertmanager     (cached)
?       github.com/prometheus/alertmanager/cmd/amtool   [no test files]
ok      github.com/prometheus/alertmanager/config       (cached)
ok      github.com/prometheus/alertmanager/config/receiver      (cached)
ok      github.com/prometheus/alertmanager/dispatch     (cached)
?       github.com/prometheus/alertmanager/examples/webhook     [no test files]
ok      github.com/prometheus/alertmanager/featurecontrol       (cached)
ok      github.com/prometheus/alertmanager/inhibit      (cached)
ok      github.com/prometheus/alertmanager/limit        (cached)
ok      github.com/prometheus/alertmanager/matcher/compat       (cached)
ok      github.com/prometheus/alertmanager/matcher/compliance   (cached)
ok      github.com/prometheus/alertmanager/matcher/parse        (cached)
ok      github.com/prometheus/alertmanager/nflog        (cached)
ok      github.com/prometheus/alertmanager/nflog/nflogpb        (cached)
ok      github.com/prometheus/alertmanager/notify       (cached)
ok      github.com/prometheus/alertmanager/notify/discord       (cached)
ok      github.com/prometheus/alertmanager/notify/email (cached)
ok      github.com/prometheus/alertmanager/notify/incidentio    (cached)
ok      github.com/prometheus/alertmanager/notify/jira  (cached)
ok      github.com/prometheus/alertmanager/notify/mattermost    (cached)
ok      github.com/prometheus/alertmanager/notify/msteams       (cached)
ok      github.com/prometheus/alertmanager/notify/msteamsv2     (cached)
ok      github.com/prometheus/alertmanager/notify/opsgenie      (cached)
ok      github.com/prometheus/alertmanager/notify/pagerduty     (cached)
ok      github.com/prometheus/alertmanager/notify/pushover      (cached)
ok      github.com/prometheus/alertmanager/notify/rocketchat    (cached)
ok      github.com/prometheus/alertmanager/notify/slack (cached)
ok      github.com/prometheus/alertmanager/notify/sns   1.174s
ok      github.com/prometheus/alertmanager/notify/telegram      (cached)
?       github.com/prometheus/alertmanager/notify/test  [no test files]
ok      github.com/prometheus/alertmanager/notify/victorops     (cached)
ok      github.com/prometheus/alertmanager/notify/webex (cached)
ok      github.com/prometheus/alertmanager/notify/webhook       (cached)
ok      github.com/prometheus/alertmanager/notify/wechat        (cached)
ok      github.com/prometheus/alertmanager/pkg/labels   (cached)
?       github.com/prometheus/alertmanager/pkg/modtimevfs       [no test files]
?       github.com/prometheus/alertmanager/provider     [no test files]
ok      github.com/prometheus/alertmanager/provider/mem (cached)
ok      github.com/prometheus/alertmanager/silence      (cached)
?       github.com/prometheus/alertmanager/silence/silencepb    [no test files]
ok      github.com/prometheus/alertmanager/store        (cached)
ok      github.com/prometheus/alertmanager/template     (cached)
?       github.com/prometheus/alertmanager/test/cli     [no test files]
ok      github.com/prometheus/alertmanager/test/cli/acceptance  3.270s
?       github.com/prometheus/alertmanager/test/testutils       [no test files]
?       github.com/prometheus/alertmanager/test/with_api_v2     [no test files]
ok      github.com/prometheus/alertmanager/test/with_api_v2/acceptance  27.500s
ok      github.com/prometheus/alertmanager/timeinterval (cached)
ok      github.com/prometheus/alertmanager/tracing      (cached)
ok      github.com/prometheus/alertmanager/types        (cached)
ok      github.com/prometheus/alertmanager/ui   (cached)
  (git)-[perf/pre-size-maps]-   $ make lint                                    
>> running golangci-lint
/home/titouan.chary/go/bin/golangci-lint run  ./...
0 issues.

Copy link
Copy Markdown
Contributor

@siavashs siavashs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@Spaceman1701 Spaceman1701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I'm certainly see no problem accepting the changes which per-allocate capacity, but I have some concerns about some of the more sophisticated optimizations.

I think any optimization which trades code clarity for performance will need some justification explaining why performance is a superseding priority for that code. I have no doubt that the changes improve performance, but I'm not sure we should increase complexity unless we have observed performance issues in real world conditions.

Comment thread api/v2/compat.go Outdated
Comment thread dispatch/dispatch.go Outdated
alertCopies := make([]types.Alert, len(alerts))

for i, alert := range alerts {
alertCopies[i] = *alert
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here to batch the allocations of the copied alerts? I guess otherwise line 786 will result in one allocation per alert...

I'm a touch worried that this will leak memory - if anybody keeps a reference to any alert that's part of this slice we cannot deallocate the entire slice.

Have you observed a performance issue with the existing logic that justifies the additional complexity?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and for raising this concern!

I've updated the commit to inline the change directly in flush() rather than extracting a separate function, hopefully this makes the diff easier to follow. The initial extraction was mainly to allow dedicated unit tests for the alert preparation logic, but it made the actual optimization harder to see.

Regarding the memory concernm I think you're right that if something held a reference to one alert, it would retain the whole array. Looking at the flush() flow though, alertsSlice is passed to notify(), processed, and then released when flush() returns. However, if there is a code path where individual alerts are retained longer, that would have been a concern with the original code as well, isn't it ? Does the introduction of alertCopies changes that risk ?

Happy to revert if you feel the complexity isn't worth it. About your initial question on the performances, we've been seeing quite a bit of memory pressure on the alertmanager from many small issues, so any low-hanging fruit that reduces allocations feels worth picking up 😄

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in the original copy the type of AlertSlice was pointers to alert. So holding onto an individual alert would keep that one alert in memory, not the whole slice. Only keeping a reference to the whole slice would keep the whole slice. In the new version the alerts are all part of one slice, so having a pointer to one alert unfortunately holds the whole slice...

If we see it's a problem to allocate and GC all these alerts, would we consider sync.Pool? Though not sure if the sync aspect would slow us down then. Unfortunately it might be hard to guarantee that no code will ever hold onto an alert/store it in a map "for a while" and such ....

Copy link
Copy Markdown
Contributor

@ultrotter ultrotter Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Spaceman1701 what would you think if we zeroed out alertCopies after use? This way any pointers to it would clearly be useless, and any user that mistakenly holds onto them would see the bug and realize they need copy+keep instead... So this would make it safe through documenting the contract (don't hold on to these alert pointers) + making any reference kept useless by clearing the slice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Spaceman1701 what would you think if we zeroed out alertCopies after use, or if we put it as part of a struct that we keep in a pool, and reset it before putting it in the pool? This way any pointers to it would break, and we wouldn't be able to hold on to them, so a user needing to keep a cache would need to do copy+cache... ?

I think my preferred solution would be to encapsulate all this logic into a struct that owns the memory, but I suppose in practice it's a minor difference. I also do feel a little more comfortable with the in-lined function since it makes things harder to misuse.

If I'm the only one with this concern, I'm also happy to be outvoted - It's not a strong concern.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for slice clear for now... It's a little unclear to me how "a struct that owns the memory" would help if the notifiers will still every get access to *types.Alert without a copy each time they do, which would defeat the purpose and duplicate the copies? But maybe we can do that as a future improvement and/or we can chat about it

@Mwea Mwea force-pushed the perf/pre-size-maps branch from e196685 to e792779 Compare February 21, 2026 09:23
@Mwea Mwea requested a review from Spaceman1701 February 26, 2026 10:03
Copy link
Copy Markdown
Contributor

@ultrotter ultrotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a slice clear when we copy to the slice, so we are sure to invalidate any held references and thus enforce the contract with the user not to hold onto it even in the future.

Comment thread dispatch/dispatch.go Outdated
alertCopies := make([]types.Alert, len(alerts))

for i, alert := range alerts {
alertCopies[i] = *alert
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for slice clear for now... It's a little unclear to me how "a struct that owns the memory" would help if the notifiers will still every get access to *types.Alert without a copy each time they do, which would defeat the purpose and duplicate the copies? But maybe we can do that as a future improvement and/or we can chat about it

Mwea and others added 7 commits March 2, 2026 08:58
The flush path copies each alert before sending to avoid mutation.
Previously each copy was a separate heap allocation. Now all copies
share a single backing array.

Benchmark results:
  10 alerts:  Before: 691 ns/op, 12 allocs  →  After: 435 ns/op, 3 allocs  (37% faster, 75% fewer allocs)
  50 alerts:  Before: 3003 ns/op, 52 allocs →  After: 1714 ns/op, 3 allocs (43% faster, 94% fewer allocs)
  100 alerts: Before: 5976 ns/op, 102 allocs → After: 3383 ns/op, 3 allocs (43% faster, 97% fewer allocs)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Pre-allocate the LabelSet map to the exact size needed (len of Equal
labels) instead of letting it grow dynamically.

Called on every inhibition check to compute the fingerprint of matching
labels. Avoids map growth overhead when Equal contains multiple labels.

Benchmark results:
  3 labels:  Before: 357 ns/op → After: 314 ns/op (12% faster)
  10 labels: Before: 1519 ns/op → After: 1365 ns/op (10% faster)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Use len(GroupBy) or len(alert.Labels) for GroupByAll as the initial
map capacity. Avoids internal map growth when grouping by many labels.

Most impactful with GroupByAll on alerts with 10+ labels where map
would otherwise grow multiple times during iteration.

Benchmark results (10 labels):
  group_by_all: Before: 714 ns/op → After: 585 ns/op (18% faster)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Extract alert partitioning into a function and pre-allocate slices
and maps to len(alerts) capacity.

Called on every DedupStage.Exec. Avoids slice/map growth when
processing groups with many alerts.

Benchmark results:
  50 alerts:  Before: 15.8 µs/op, 126 allocs → After: 13.1 µs/op, 110 allocs (17% faster, 13% fewer allocs)
  100 alerts: Before: 29.0 µs/op, 232 allocs → After: 25.7 µs/op, 210 allocs (11% faster, 9% fewer allocs)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Pre-allocate the alerts slice with the known length to avoid slice
growth allocations during API alert conversion.

Benchmark with 100 alerts (PreAllocated vs AppendGrowth):
  50736 B/op vs 51768 B/op  (-2% memory)
  404 allocs vs 408 allocs  (-1% allocations)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
The Data() method was calling types.Alerts() twice - once for Status()
and once for iteration. Each call allocates a new slice. Caching the
result avoids the duplicate allocation.

Benchmark with 50 alerts (SingleCall vs DuplicateCall):
  4416 B/op vs 8832 B/op   (-50% memory)
  51 allocs vs 102 allocs  (-50% allocations)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
This is so a caller that may try to hold on to one of the pointers can realize that they shouldn't do that, by having the data held blanked out and invalidated.

Signed-off-by: Guido Trotter <ultrotter@gmail.com>
@Mwea Mwea force-pushed the perf/pre-size-maps branch from e7cf57a to ab5b200 Compare March 2, 2026 08:02
Mwea added 2 commits March 10, 2026 12:05
Revert the batch allocation optimization (b6f01d7) and the subsequent
clear(alertCopies) addition (ab5b200).

The batch allocation aimed to reduce per-alert heap allocations by
sharing a backing array. However, this introduced a memory retention
concern: holding one alert reference would retain the entire array.

The clear() was added to "enforce the contract" that notifiers should
not retain alert pointers. However, this approach breaks both tests
and any real code that legitimately accesses alerts during or after
the notify callback - it zeroes the alert data, causing fingerprint
calculations to return invalid values.

The complexity and test failures outweigh the allocation savings.
Reverting to per-alert allocations which are simpler and have clear
ownership semantics.

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
@Mwea
Copy link
Copy Markdown
Contributor Author

Mwea commented Mar 10, 2026

Revert the batch allocation optimization (b6f01d7) and the subsequent clear(alertCopies) addition (ab5b200).

The batch allocation aimed to reduce per-alert heap allocations by sharing a backing array. However, this introduced a memory retention concern: holding one alert reference would retain the entire array.

The clear() was added to "enforce the contract" that notifiers should not retain alert pointers. However, this approach breaks both tests and any real code that legitimately accesses alerts during or after the notify callback - it zeroes the alert data, causing fingerprint calculations to return invalid values.

The complexity and test failures outweigh the allocation savings. Reverting to per-alert allocations which are simpler and have clear ownership semantics.

@Spaceman1701
Copy link
Copy Markdown
Contributor

Thanks for the contribution!

@Spaceman1701 Spaceman1701 merged commit eac945c into prometheus:main Mar 10, 2026
6 checks passed
SoloJacobs pushed a commit to SoloJacobs/alertmanager that referenced this pull request Mar 15, 2026
…atch allocation (prometheus#5020)

* perf: batch allocate alert copies in flush

The flush path copies each alert before sending to avoid mutation.
Previously each copy was a separate heap allocation. Now all copies
share a single backing array.

Benchmark results:
  10 alerts:  Before: 691 ns/op, 12 allocs  →  After: 435 ns/op, 3 allocs  (37% faster, 75% fewer allocs)
  50 alerts:  Before: 3003 ns/op, 52 allocs →  After: 1714 ns/op, 3 allocs (43% faster, 94% fewer allocs)
  100 alerts: Before: 5976 ns/op, 102 allocs → After: 3383 ns/op, 3 allocs (43% faster, 97% fewer allocs)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

* perf: pre-size map in fingerprintEquals

Pre-allocate the LabelSet map to the exact size needed (len of Equal
labels) instead of letting it grow dynamically.

Called on every inhibition check to compute the fingerprint of matching
labels. Avoids map growth overhead when Equal contains multiple labels.

Benchmark results:
  3 labels:  Before: 357 ns/op → After: 314 ns/op (12% faster)
  10 labels: Before: 1519 ns/op → After: 1365 ns/op (10% faster)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

* perf: pre-size map in getGroupLabels

Use len(GroupBy) or len(alert.Labels) for GroupByAll as the initial
map capacity. Avoids internal map growth when grouping by many labels.

Most impactful with GroupByAll on alerts with 10+ labels where map
would otherwise grow multiple times during iteration.

Benchmark results (10 labels):
  group_by_all: Before: 714 ns/op → After: 585 ns/op (18% faster)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

* perf: pre-size collections in DedupStage alert partitioning

Extract alert partitioning into a function and pre-allocate slices
and maps to len(alerts) capacity.

Called on every DedupStage.Exec. Avoids slice/map growth when
processing groups with many alerts.

Benchmark results:
  50 alerts:  Before: 15.8 µs/op, 126 allocs → After: 13.1 µs/op, 110 allocs (17% faster, 13% fewer allocs)
  100 alerts: Before: 29.0 µs/op, 232 allocs → After: 25.7 µs/op, 210 allocs (11% faster, 9% fewer allocs)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

* perf: pre-allocate slice in OpenAPIAlertsToAlerts

Pre-allocate the alerts slice with the known length to avoid slice
growth allocations during API alert conversion.

Benchmark with 100 alerts (PreAllocated vs AppendGrowth):
  50736 B/op vs 51768 B/op  (-2% memory)
  404 allocs vs 408 allocs  (-1% allocations)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

* perf: cache types.Alerts() result in Template.Data

The Data() method was calling types.Alerts() twice - once for Status()
and once for iteration. Each call allocates a new slice. Caching the
result avoids the duplicate allocation.

Benchmark with 50 alerts (SingleCall vs DuplicateCall):
  4416 B/op vs 8832 B/op   (-50% memory)
  51 allocs vs 102 allocs  (-50% allocations)

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

* Clear the alertCopies slice after notification

This is so a caller that may try to hold on to one of the pointers can realize that they shouldn't do that, by having the data held blanked out and invalidated.

Signed-off-by: Guido Trotter <ultrotter@gmail.com>

* revert: remove batch allocation and clear in flush

Revert the batch allocation optimization (b6f01d7) and the subsequent
clear(alertCopies) addition (ab5b200).

The batch allocation aimed to reduce per-alert heap allocations by
sharing a backing array. However, this introduced a memory retention
concern: holding one alert reference would retain the entire array.

The clear() was added to "enforce the contract" that notifiers should
not retain alert pointers. However, this approach breaks both tests
and any real code that legitimately accesses alerts during or after
the notify callback - it zeroes the alert data, causing fingerprint
calculations to return invalid values.

The complexity and test failures outweigh the allocation savings.
Reverting to per-alert allocations which are simpler and have clear
ownership semantics.

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>

---------

Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>
Co-authored-by: Guido Trotter <ultrotter@gmail.com>
SoloJacobs added a commit to SoloJacobs/alertmanager that referenced this pull request Apr 6, 2026
* [CHANGE] `go get github.com/prometheus/alertmanager/ui` will now fail as compiled UI assets are no longer checked into the repository. Downstream builds that rely on these assets being present in the source tree must now build the UI from source. prometheus#5113
* [CHANGE] The '--enable-feature=auto-gomaxprocs' option is deprecated and will be removed in v0.33. This flag currently has no effect and can be safely removed from any startup scripts. prometheus#5090
* [CHANGE] Update internal function signatures across multiple packages. This affects any project that integrates `Alertmanager` code.
* [ENHANCEMENT] Add static asset caching. prometheus#5113
* [ENHANCEMENT] Reduce memory allocations through pre-sizing collections and batch allocation. prometheus#5020
* [ENHANCEMENT] Replace help with documentation in navigation bar. prometheus#4943
* [ENHANCEMENT] docs(ha): Update high availability documentation. prometheus#5136
* [ENHANCEMENT] docs: Add `auth_secret_file` for smtp in document. prometheus#5036
* [ENHANCEMENT] docs: Add description for global `telegram_bot_token`. prometheus#5114
* [ENHANCEMENT] docs: Add note about notifier timeouts. prometheus#5077
* [ENHANCEMENT] docs: Fix `force_implicit_tls` config field name. prometheus#5030
* [ENHANCEMENT] docs: Link community supported integrations. prometheus#4978
* [ENHANCEMENT] docs: Remove duplicate header. prometheus#5034
* [ENHANCEMENT] docs: Update mutual tls reference in high availability documentation. prometheus#5120
* [ENHANCEMENT] tracing: Use noop spans when tracing disabled. prometheus#5118
* [FEATURE] Add silence annotations. prometheus#4965
* [FEATURE] Add silence logging option. prometheus#4163
* [FEATURE] Add support for multiple matcher set silences. prometheus#4957
* [FEATURE] Add the reason for notifying in dedup stage. prometheus#4971
* [FEATURE] mattermost: Flatten attachments into top-level config. prometheus#5009
* [FEATURE] mattermost: Support global webhook url. prometheus#4998
* [FEATURE] slack: Add default color from template. prometheus#5014
* [FEATURE] slack: Allow receiver to edit existing messages. prometheus#5007
* [FEATURE] template: Add dict, map and append functions. prometheus#5093
* [FEATURE] webhook: Add full payload templating support for notifier. prometheus#5011
* [BUGFIX] config: Check for empty cluster tls client config. prometheus#5126
* [BUGFIX] config: Don't crash upon reading empty config for notifier. prometheus#4979
* [BUGFIX] config: Fix ipv6 address handling in hostport.string(). prometheus#5040
* [BUGFIX] mattermost: Omit empty text field in notifications. prometheus#4985
* [BUGFIX] telegram: Send fallback message when notification exceeds character limit. prometheus#5074
* [BUGFIX] ui: Fix escaping for matcher values with quotes. prometheus#4862
* [BUGFIX] ui: Handle special chars in silence regex-matchers. prometheus#4942
* [BUGFIX] ui: Support utf-8 label names in matchers. prometheus#5089

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
@SoloJacobs SoloJacobs mentioned this pull request Apr 6, 2026
ultrotter pushed a commit that referenced this pull request Apr 7, 2026
* [CHANGE] `go get github.com/prometheus/alertmanager/ui` will now fail as compiled UI assets are no longer checked into the repository. Downstream builds that rely on these assets being present in the source tree must now build the UI from source. #5113
* [CHANGE] The '--enable-feature=auto-gomaxprocs' option is deprecated and will be removed in v0.33. This flag currently has no effect and can be safely removed from any startup scripts. #5090
* [CHANGE] Update internal function signatures across multiple packages. This affects any project that integrates `Alertmanager` code.
* [ENHANCEMENT] Add static asset caching. #5113
* [ENHANCEMENT] Reduce memory allocations through pre-sizing collections and batch allocation. #5020
* [ENHANCEMENT] Replace help with documentation in navigation bar. #4943
* [ENHANCEMENT] docs(ha): Update high availability documentation. #5136
* [ENHANCEMENT] docs: Add `auth_secret_file` for smtp in document. #5036
* [ENHANCEMENT] docs: Add description for global `telegram_bot_token`. #5114
* [ENHANCEMENT] docs: Add note about notifier timeouts. #5077
* [ENHANCEMENT] docs: Fix `force_implicit_tls` config field name. #5030
* [ENHANCEMENT] docs: Link community supported integrations. #4978
* [ENHANCEMENT] docs: Remove duplicate header. #5034
* [ENHANCEMENT] docs: Update mutual tls reference in high availability documentation. #5120
* [ENHANCEMENT] tracing: Use noop spans when tracing disabled. #5118
* [FEATURE] Add silence annotations. #4965
* [FEATURE] Add silence logging option. #4163
* [FEATURE] Add support for multiple matcher set silences. #4957
* [FEATURE] Add the reason for notifying in dedup stage. #4971
* [FEATURE] mattermost: Flatten attachments into top-level config. #5009
* [FEATURE] mattermost: Support global webhook url. #4998
* [FEATURE] slack: Add default color from template. #5014
* [FEATURE] slack: Allow receiver to edit existing messages. #5007
* [FEATURE] template: Add dict, map and append functions. #5093
* [FEATURE] webhook: Add full payload templating support for notifier. #5011
* [BUGFIX] config: Check for empty cluster tls client config. #5126
* [BUGFIX] config: Don't crash upon reading empty config for notifier. #4979
* [BUGFIX] config: Fix ipv6 address handling in hostport.string(). #5040
* [BUGFIX] mattermost: Omit empty text field in notifications. #4985
* [BUGFIX] telegram: Send fallback message when notification exceeds character limit. #5074
* [BUGFIX] ui: Fix escaping for matcher values with quotes. #4862
* [BUGFIX] ui: Handle special chars in silence regex-matchers. #4942
* [BUGFIX] ui: Support utf-8 label names in matchers. #5089

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants