Skip to content

fix: dynamic allocator rework#35

Merged
rustatian merged 20 commits intomasterfrom
fix/correctly-deallocate
Dec 28, 2025
Merged

fix: dynamic allocator rework#35
rustatian merged 20 commits intomasterfrom
fix/correctly-deallocate

Conversation

@rustatian
Copy link
Member

@rustatian rustatian commented Oct 20, 2025

Reason for This PR

ref: roadrunner-server/roadrunner#2086
ref: roadrunner-server/roadrunner#2092

Description of Changes

  • Remove unnecessary parameters, simplify get operation.
  • Remove 2s double wait for a free worker.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

Summary by CodeRabbit

  • New Features

    • Added a token-based rate limiter for dynamic worker allocation.
  • Improvements

    • Raised scaling caps and channel capacities; improved idle-time cleanup and spawn/deallocate behavior.
    • Strengthened concurrency and worker-management robustness.
  • Tests

    • Added/expanded tests for dynamic allocation, rate limiter, channel/vector behavior, and a slow-request scenario.
  • Chores

    • Bumped Go toolchain/dependencies, added test timeouts, and simplified CI matrix/test targets.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
@rustatian rustatian requested a review from Copilot October 20, 2025 09:19
@rustatian rustatian self-assigned this Oct 20, 2025
@rustatian rustatian added the enhancement New feature or request label Oct 20, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Reworks dynamic allocator to value-based atomics, adds a RateLimiter and explicit idle-TTL lifecycle, raises dynamic max workers and channel capacities, converts WorkerWatcher internals to atomic.Uint64 + sync.Map, adds tests, bumps Go toolchain/indirect deps, and removes several nolint:stylecheck directives.

Changes

Cohort / File(s) Summary
Go module & toolchain
go.mod
Bump Go toolchain to 1.25 and update indirect dependencies (github.com/tklauser/*, golang.org/x/sys).
Dynamic allocator & pool integration
pool/static_pool/dyn_allocator.go, pool/static_pool/dyn_allocator_test.go, pool/static_pool/pool.go
Replace pointer-based atomics with value-based (atomic.Uint64, atomic.Bool); add rateLimit *ratelimiter.RateLimiter and lastAllocTry; rename allocateDynamicallyaddMoreWorkers, dynamicTTLListenerstartIdleTTLListener; remove execLock/ttl channel; constructor call updated (mutex param removed); add extensive dynamic allocator tests.
Rate limiter
pool/ratelimiter/ratelimiter.go, pool/ratelimiter/ratelimiter_test.go
Add new token-style RateLimiter (NewRateLimiter, TryAcquire, Release) with cooldown behaviour and concurrent tests.
WorkerWatcher concurrency refactor
worker_watcher/worker_watcher.go
Replace anonymous RWMutex with named mu; switch numWorkersatomic.Uint64; replace map[int64]*worker.Process with sync.Map; update internal operations (Watch/Add/Remove/Allocate/Reset/Destroy/List); raise maxWorkers to 2048.
Channel & container tuning
worker_watcher/container/channel/vec.go, worker_watcher/container/channel/vec_test.go
Increase internal vector/channel capacity (600→2048); reduce Pop reset-wait (100ms→10ms); add tests for push/pop, len, destroy, and timeout/cancel semantics.
Pool config changes
pool/config.go
Add maxWorkers = 2048; change DynamicAllocationOpts.InitDefaults signature to accept current NumWorkers and cap MaxWorkers relative to existing workers.
Tests, PHP helper and infra
tests/slow_req.php, tests/composer.json, Makefile, .github/workflows/tests.yml
Add PHP slow-request helper used by dynamic tests; update composer deps; add -timeout 30m to test coverage runs and new test targets; reduce CI matrix to ubuntu-latest.
Style directive removals & minor files
pool/static_pool/debug.go, pool/static_pool/fuzz_test.go, pool/static_pool/options.go, pool/static_pool/stream.go, pool/static_pool/supervisor.go, pool/static_pool/supervisor_test.go, pool/static_pool/pool_test.go
Remove //nolint:stylecheck directives and remove several dynamic tests from pool_test.go. No behavioural changes in these files.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Pool
    participant DynAllocator
    participant RateLimiter
    participant WorkerWatcher
    participant TTLListener

    rect `#E6F0FF`
    Note over Client,Pool: Request handling and potential dynamic spawn
    Client->>Pool: Execute request
    Pool->>DynAllocator: takeWorker / request worker
    DynAllocator->>RateLimiter: TryAcquire()
    alt acquired
        RateLimiter-->>DynAllocator: true
        DynAllocator->>WorkerWatcher: Allocate / AddWorker
        WorkerWatcher-->>DynAllocator: worker created
        DynAllocator->>DynAllocator: atomically increment currAllocated
        DynAllocator->>RateLimiter: Release() (schedules cooldown)
        DynAllocator-->>Pool: worker returned
        Pool-->>Client: request handled
    else denied
        RateLimiter-->>DynAllocator: false
        DynAllocator-->>Pool: fallback (reuse or fail)
        Pool-->>Client: request handled or error
    end
    end

    rect `#FFF0E6`
    Note over TTLListener,WorkerWatcher: Idle TTL deallocation loop
    TTLListener->>TTLListener: wait idle timeout
    TTLListener->>DynAllocator: check lastAllocTry
    alt recent allocation
        DynAllocator-->>TTLListener: skip deallocation
    else safe to deallocate
        TTLListener->>WorkerWatcher: remove up to spawnRate workers
        loop per worker
            WorkerWatcher->>WorkerWatcher: stop/kill worker
            WorkerWatcher->>DynAllocator: report removal
            DynAllocator->>DynAllocator: atomically decrement currAllocated
        end
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • wolfy-j

Poem

🐰 I hopped through atoms, maps, and queues,
I timed the spawns and calmed the flues,
Rate limits whisper, TTLs unwind,
Workers sleep and wake just fine,
A little patch, from rabbit — spry and kind 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: dynamic allocator rework' directly relates to the main changes, which extensively rework the dynamic allocator component with value-based atomics, rate limiting, and lifecycle adjustments.
Description check ✅ Passed The PR description references relevant issues, provides reasoning and high-level change descriptions, includes license acceptance, and confirms all checklist items are met.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/correctly-deallocate

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bddc55a and 9d4af10.

📒 Files selected for processing (1)
  • .github/workflows/tests.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/tests.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the dynamic allocator system by simplifying the get operation and removing unnecessary wait periods. The changes improve efficiency by eliminating the 2-second wait for free workers and removing redundant TTL trigger mechanisms.

  • Simplified dynamic allocator logic by removing TTL trigger channels and unnecessary waiting
  • Updated documentation and comments for better clarity and grammar corrections
  • Removed //nolint:stylecheck comments from package declarations across multiple files

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pool/static_pool/dyn_allocator.go Major refactoring of dynamic allocation logic, removing TTL triggers and simplifying worker allocation flow
pool/static_pool/pool.go Updated comments and documentation for better clarity and grammar
worker_watcher/worker_watcher.go Minor punctuation fix in comment
tests/slow_req.php New test file for slow request testing
pool/static_pool/dyn_allocator_test.go New test file for dynamic allocator functionality
go.mod Updated Go version from 1.24.0 to 1.25
Multiple static_pool files Removed //nolint:stylecheck comments from package declarations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pool/static_pool/dyn_allocator.go (1)

17-33: Add execLock.RLock protection to Exec operations to prevent worker deallocation during execution.

The execLock RWMutex is used only for write-locking in the TTL listener (lines 124, 131, 158 in dyn_allocator.go) during worker deallocation, but there are no corresponding RLock calls during Exec operations in pool.go. This creates a race condition where workers can be deallocated while actively executing requests.

The Exec method (pool.go:165) and its call path through takeWorker (pool.go:421) must acquire execLock.RLock() before executing a worker and release it after completion to prevent the TTL listener from deallocating workers in use.

🧹 Nitpick comments (3)
tests/slow_req.php (1)

1-27: Add documentation explaining the test script's purpose.

Consider adding a header comment explaining:

  • This script simulates slow/hung worker requests for dynamic allocator testing
  • The even counter requests return errors (to test error handling paths)
  • The odd counter requests sleep for 100 seconds (to force dynamic worker allocation)
  • How it's used in the test suite
pool/static_pool/dyn_allocator.go (1)

146-147: Clarify or remove confusing comment.

The comment mentions "if we had an error in the da.ww.Take code block," but there's no Take call in this code; it's RemoveWorker. Additionally, the comment seems to suggest a problem, but the code correctly handles errors by skipping the decrement when RemoveWorker fails (line 142 continue).

Consider updating the comment to clarify the actual concern, or remove it if it's outdated:

-					// potential problem: if we had an error in the da.ww.Take code block, we'd still have the currAllocated > 0
+					// Note: if RemoveWorker fails, we skip the decrement to keep currAllocated accurate
pool/static_pool/dyn_allocator_test.go (1)

88-92: Consider failing the test on request errors above a threshold.

The current error handling logs failures but doesn't fail the test. While some errors might be expected during concurrent resets, silently ignoring all errors could mask real issues with the dynamic allocator.

Consider tracking error counts and asserting an acceptable failure rate:

+	var errCount atomic.Uint64
 	wg := &sync.WaitGroup{}
 	wg.Add(10000)
 	go func() {
 		for range 10000 {
 			go func() {
 				defer wg.Done()
 				r, erre := np.Exec(ctx, &payload.Payload{Body: []byte("hello"), Context: nil}, make(chan struct{}))
 				if erre != nil {
 					t.Log("failed request: ", erre.Error())
+					errCount.Add(1)
 					return
 				}
 				resp := <-r
 				require.Equal(t, []byte("hello"), resp.Body())
 			}()
 		}
 	}()
 	
 	// ... after wg.Wait() ...
+	// Allow some failures due to resets, but ensure most requests succeed
+	assert.Less(t, errCount.Load(), uint64(100), "Too many failed requests")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c21554 and 27a0ae2.

📒 Files selected for processing (13)
  • go.mod (1 hunks)
  • pool/static_pool/debug.go (1 hunks)
  • pool/static_pool/dyn_allocator.go (7 hunks)
  • pool/static_pool/dyn_allocator_test.go (1 hunks)
  • pool/static_pool/fuzz_test.go (1 hunks)
  • pool/static_pool/options.go (1 hunks)
  • pool/static_pool/pool.go (13 hunks)
  • pool/static_pool/pool_test.go (1 hunks)
  • pool/static_pool/stream.go (1 hunks)
  • pool/static_pool/supervisor.go (1 hunks)
  • pool/static_pool/supervisor_test.go (1 hunks)
  • tests/slow_req.php (1 hunks)
  • worker_watcher/worker_watcher.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
pool/static_pool/pool.go (5)
pool/config.go (1)
  • Config (9-37)
pool/command.go (1)
  • Command (8-8)
pool/allocator.go (1)
  • NewPoolAllocator (25-43)
worker_watcher/worker_watcher.go (1)
  • NewSyncWorkerWatcher (41-53)
pool/static_pool/stream.go (1)
  • PExec (5-8)
tests/slow_req.php (1)
payload/payload.go (1)
  • Payload (9-18)
pool/static_pool/dyn_allocator_test.go (5)
pool/config.go (2)
  • Config (9-37)
  • DynamicAllocationOpts (91-95)
pool/static_pool/pool.go (1)
  • NewPool (52-129)
pool/command.go (1)
  • Command (8-8)
ipc/pipe/pipe.go (1)
  • NewPipeFactory (23-27)
payload/payload.go (1)
  • Payload (9-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Analyze (go)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
🔇 Additional comments (16)
pool/static_pool/supervisor.go (1)

1-1: Linting directive removal approved.

Removing nolint:stylecheck is safe; the package name static_pool follows Go naming conventions.

go.mod (1)

3-3: Verify Go 1.25 compatibility with pinned dependencies.

The toolchain upgrade from Go 1.24 to 1.25 is a routine maintenance change. All dependencies remain pinned and unchanged. Ensure the CI pipeline has validated compatibility, especially with goridge/v3 and other critical dependencies.

pool/static_pool/pool_test.go (1)

1-1: Linting directive removal approved.

No functional test changes; the removal is safe and idiomatic.

pool/static_pool/options.go (1)

1-1: Linting directive removal approved.

No functional changes to the options API; removal is safe.

pool/static_pool/fuzz_test.go (1)

1-1: Linting directive removal approved.

No functional changes to the fuzz test; removal is safe.

pool/static_pool/debug.go (1)

1-1: Linting directive removal approved.

No functional changes to debug execution logic; removal is safe.

pool/static_pool/supervisor_test.go (1)

1-1: Linting directive removal approved.

No functional test changes; removal is safe and aligns with broader linting cleanup.

pool/static_pool/stream.go (1)

1-1: Linting directive removal approved.

No functional changes to stream types or helpers; removal is safe.

pool/static_pool/pool.go (1)

1-449: LGTM! Documentation improvements enhance code clarity.

All changes in this file are limited to comment and documentation refinements (grammar, punctuation, and wording improvements). No functional changes or API modifications detected.

worker_watcher/worker_watcher.go (1)

1-451: LGTM! Minor formatting improvements.

The changes remove an unnecessary linting directive and improve comment punctuation. No functional impact.

tests/slow_req.php (1)

22-22: Verify the 100-second sleep duration is appropriate.

The 100-second sleep appears intentionally long to simulate slow/hung workers for dynamic allocation testing. However, the associated test (Test_DynAllocatorManyReq) completes in ~15 seconds, meaning workers will still be sleeping when the test finishes. Please confirm this duration aligns with the test's objectives and won't cause resource leaks or hanging processes in the test environment.

pool/static_pool/dyn_allocator.go (3)

80-83: Verify the boundary condition change from >= to ==.

The max workers check was changed from >= to ==. While line 89 prevents currAllocated from exceeding maxWorkers during the allocation loop, this is a subtle logic change. The == check assumes currAllocated can never exceed maxWorkers, which is enforced by the loop guard but wasn't previously assumed at this entry point.

This change appears safe given the loop guard, but please confirm this was intentional and that there are no other code paths that could increment currAllocated beyond maxWorkers.


103-107: Clarify the 2-second timeout purpose.

The PR description mentions "Remove a 2s double wait for a free worker," but this code introduces a 2-second timeout for Take. Please clarify:

  • What "double wait" was removed?
  • Is this 2-second timeout the replacement, or is this a different wait?
  • Why 2 seconds specifically?

128-133: Verify early exit logic.

When currAllocated == 0, the function performs goto exit, which exits the entire goroutine. This means the ticker continues running until the next scheduled tick (at idleTimeout).

Consider whether this early exit should instead continue in the select loop to keep the listener running, or if exiting the entire goroutine is the intended behavior. If exiting is correct, please confirm that having the listener terminate when there are no dynamic workers is the desired design.

pool/static_pool/dyn_allocator_test.go (2)

34-54: LGTM! Basic dynamic allocator test is well-structured.

The test validates basic dynamic allocation functionality with a simple request-response cycle.


56-114: Verify test doesn't exhibit flakiness with long-running workers.

The test uses slow_req.php which sleeps for 100 seconds per request (line 73), but the test only runs for ~15 seconds total (line 109). This means:

  1. Workers will still be processing (sleeping) when the test ends
  2. The assertion at line 111 expects exactly 5 workers, assuming dynamic workers were deallocated after the 10s idle timeout
  3. However, if workers are actively processing, deallocation might not occur as expected

Please confirm this test passes consistently and that the 15-second sleep is sufficient for:

  • The idle timeout (10s) to trigger
  • Dynamic workers to be successfully removed
  • The worker count to stabilize at exactly 5

Additionally, consider adding an assertion to verify that dynamic allocation actually occurred during the test:

// After some requests have started but before the 15s sleep
time.Sleep(time.Second * 2)
assert.Greater(t, int(np.NumDynamic()), 0, "Dynamic allocation should have occurred")

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
pool/static_pool/dyn_allocator.go (3)

136-142: Syntax error: for range alloc is invalid; also avoid unbounded removal while holding locks.

  • Replace the invalid loop with a proper counter loop.
  • Use a bounded context for RemoveWorker to prevent hangs while execLock/mu are held.
-                alloc := da.currAllocated.Load()
-                for range alloc {
-                    // take the worker from the stack, inifinite timeout
-                    // we should not block here forever
-                    err := da.ww.RemoveWorker(context.Background())
+                alloc := da.currAllocated.Load()
+                for i := uint64(0); i < alloc; i++ {
+                    // take the worker from the stack with a bounded timeout
+                    ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+                    err := da.ww.RemoveWorker(ctx)
+                    cancel()
                     if err != nil {
                         da.log.Error("failed to take worker from the stack", zap.Error(err))
                         continue
                     }

114-122: Ensure ticker is always stopped; guard against non‑positive idleTimeout.

  • Add defer triggerTTL.Stop() to stop the ticker on any exit path.
  • If idleTimeout <= 0, time.NewTicker will panic.
-        triggerTTL := time.NewTicker(da.idleTimeout)
+        if da.idleTimeout <= 0 {
+            da.log.Warn("dynamic allocator listener disabled: idle_timeout <= 0")
+            return
+        }
+        triggerTTL := time.NewTicker(da.idleTimeout)
+        defer triggerTTL.Stop()
         for {
             select {
             case <-da.stopCh:
                 da.log.Debug("dynamic allocator listener stopped")
                 goto exit
             // when this channel is triggered, we should deallocate all dynamically allocated workers
             case <-triggerTTL.C:
-                triggerTTL.Stop()
                 da.log.Debug("dynamic workers TTL", zap.String("reason", "idle timeout reached"))

139-141: Comment contradicts behavior and misspells “infinite”.

If using a bounded context (recommended above), adjust the comment. Otherwise, fix spelling and reconcile the statement.

-                    // take the worker from the stack, inifinite timeout
-                    // we should not block here forever
+                    // take the worker from the stack with a bounded timeout to avoid blocking while locks are held
♻️ Duplicate comments (1)
pool/static_pool/dyn_allocator.go (1)

85-86: Fix misleading comment about loop start.

The loop starts at 0 and no worker is pre‑allocated here; update the comment accordingly.

-    // we're starting from the 1 because we already allocated one worker which would be released in the Exec function
+    // allocate up to spawnRate workers (they will be available after allocation completes)
     // i < da.spawnRate - we can't allocate more workers than the spawn rate
🧹 Nitpick comments (4)
pool/static_pool/dyn_allocator.go (4)

98-101: Prefer Store/Add over Swap for counter updates under mutex.

You’re already under mu; Swap(Load()+1) is unnecessary and obscures intent.

-        // increase the number of additionally allocated options
-        _ = da.currAllocated.Swap(da.currAllocated.Load() + 1)
+        // increase the number of additionally allocated workers
+        da.currAllocated.Store(da.currAllocated.Load() + 1)

149-151: Same here: decrement with Store instead of Swap.

Keeps intent clear and avoids odd semantics with unsigned atomics.

-                    // decrease the number of additionally allocated options
-                    _ = da.currAllocated.Swap(da.currAllocated.Load() - 1)
+                    // decrease the number of additionally allocated workers
+                    da.currAllocated.Store(da.currAllocated.Load() - 1)

165-165: Exit log message can be misleading.

On stopCh path or partial failures, not all dynamic workers may be deallocated. Log the remaining count (or adjust the wording).

-        da.log.Debug("dynamic allocator listener exited, all dynamically allocated workers deallocated")
+        da.log.Debug("dynamic allocator listener exited", zap.Uint64("remaining_dyn_allocated", da.currAllocated.Load()))

63-67: Consider narrowing the critical section.

Holding mu across ww.Take(ctx) (up to 2s) serializes all allocations unnecessarily. Unlock after spawning/incrementing, then perform Take.

If you want, I can propose a concrete diff to restructure the lock scope. Please confirm if ww.AddWorker/ww.Take have additional invariants that require mu to be held.

Also applies to: 103-107

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27a0ae2 and aff3273.

📒 Files selected for processing (2)
  • pool/static_pool/dyn_allocator.go (7 hunks)
  • pool/static_pool/pool.go (13 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pool/static_pool/pool.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS macos-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 83.97790% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.04%. Comparing base (e4d6b55) to head (9d4af10).
⚠️ Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
pool/static_pool/dyn_allocator.go 69.23% 17 Missing and 3 partials ⚠️
worker_watcher/worker_watcher.go 93.25% 4 Missing and 2 partials ⚠️
pool/config.go 50.00% 1 Missing and 1 partial ⚠️
pool/static_pool/pool.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #35      +/-   ##
==========================================
- Coverage   70.26%   70.04%   -0.22%     
==========================================
  Files          18       19       +1     
  Lines        1473     1489      +16     
==========================================
+ Hits         1035     1043       +8     
- Misses        384      391       +7     
- Partials       54       55       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
pool/static_pool/dyn_allocator.go (2)

85-85: Fix the misleading comment (still present despite past review).

The comment states "we're starting from the 1" but the loop actually starts at i := uint64(0). This was marked as addressed in a previous review but the incorrect comment is still present.

Apply this diff:

-	// we're starting from the 1 because we already allocated one worker which would be released in the Exec function
+	// allocate up to spawnRate workers (they will be available after allocation completes)
 	// i < da.spawnRate - we can't allocate more workers than the spawn rate

147-148: Update the outdated comment (still not fixed from past review).

The comment refers to Take but the code uses RemoveWorker. This was flagged in a previous review but remains unfixed.

Apply this diff:

-					// potential problem: if we had an error in the da.ww.Take code block, we'd still have the currAllocated > 0
+					// potential problem: if RemoveWorker fails above, currAllocated may remain > 0
🧹 Nitpick comments (1)
pool/static_pool/dyn_allocator.go (1)

137-152: Consider adding a safety check for worker removal.

The deallocation loop removes alloc workers (line 137), but if RemoveWorker fails (line 140-144), the counter is still decremented (line 150). This can lead to a mismatch between currAllocated and the actual number of dynamically allocated workers.

Consider only decrementing currAllocated when RemoveWorker succeeds:

 				alloc := da.currAllocated.Load()
 				for range alloc {
 					// take the worker from the stack, inifinite timeout
 					// we should not block here forever
 					err := da.ww.RemoveWorker(context.Background())
 					if err != nil {
 						da.log.Error("failed to take worker from the stack", zap.Error(err))
 						continue
 					}
 
-					// potential problem: if we had an error in the da.ww.Take code block, we'd still have the currAllocated > 0
+					// decrement only on successful removal
+					da.currAllocated.Add(^uint64(0)) // atomic decrement
+					da.log.Debug("deallocated additional worker", zap.Uint64("currently additionally allocated", da.currAllocated.Load()))
+				}
-
-					// decrease the number of additionally allocated options
-					_ = da.currAllocated.Swap(da.currAllocated.Load() - 1)
-					da.log.Debug("deallocated additional worker", zap.Uint64("currently additionally allocated", da.currAllocated.Load()))
-				}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45941fa and f0feaff.

📒 Files selected for processing (1)
  • pool/static_pool/dyn_allocator.go (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pool/static_pool/dyn_allocator.go (1)
worker/worker.go (1)
  • Process (27-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build (Go stable, PHP 8.4, OS macos-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Analyze (go)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS macos-latest)
🔇 Additional comments (3)
pool/static_pool/dyn_allocator.go (3)

24-24: Good refactor to value-based atomics.

The change from pointer-based atomics (atomic.Pointer[uint64], atomic.Pointer[bool]) to value-based atomics (atomic.Uint64, atomic.Bool) is more idiomatic and cleaner.

Also applies to: 27-27


54-55: LGTM!

The initialization of value-based atomics is correct.


80-82: LGTM on the boundary check improvement.

The use of >= instead of == correctly prevents accidental overruns and is the safer, conventional approach for enforcing limits.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pool/static_pool/dyn_allocator.go (1)

104-109: Confirmed: Hard-coded 2s timeout creates double-wait scenario. Pass context or use existing idleTimeout config.

The concern is valid. The call chain shows:

  1. Exec() receives ctx with a deadline
  2. takeWorker() is called with ctxGetFree containing AllocateTimeout (line 196)
  3. When ww.Take(ctxGetFree) times out and returns NoFreeWorkers error, takeWorker() calls allocateDynamically() (line 443)
  4. allocateDynamically() creates a fresh context with hardcoded time.Second*2 timeout (line 104), ignoring the expired caller context

This means a request that already exhausted its AllocateTimeout gets a completely independent 2-second window, defeating timeout intent at the caller level.

The dynAllocator struct already has an idleTimeout field (line 24) configured from settings, but it's only used in dynamicTTLListener, not in the Take() operation.

Recommendation: Implement Option A (preferred). Pass the caller's context through:

- func (da *dynAllocator) allocateDynamically() (*worker.Process, error) {
+ func (da *dynAllocator) allocateDynamically(ctx context.Context) (*worker.Process, error) {
     ...
-    ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
-    w, err := da.ww.Take(ctx)
-    cancel()
+    w, err := da.ww.Take(ctx)
     return w, err
  }

Then update the call site at pool.go:443 to pass ctxGetFree:

- return sp.dynamicAllocator.allocateDynamically()
+ return sp.dynamicAllocator.allocateDynamically(ctxGetFree)

This respects caller timeout policies without adding arbitrary secondary waits.

♻️ Duplicate comments (2)
pool/static_pool/dyn_allocator.go (2)

86-90: Fix misleading comment about loop start.

The loop starts at 0, not 1. Reword to describe allocating up to spawnRate (bounded by maxWorkers).

-	// we're starting from the 1 because we already allocated one worker which would be released in the Exec function
+	// allocate up to spawnRate workers (bounded by maxWorkers) to handle current pressure

145-146: Update outdated comment to match RemoveWorker.

-					// potential problem: if we had an error in the da.ww.Take code block, we'd still have the currAllocated > 0
+					// if RemoveWorker fails, currAllocated may remain > 0; we log below
🧹 Nitpick comments (5)
pool/static_pool/dyn_allocator.go (3)

37-43: Remove unused execLock parameter; align locking strategy.

The constructor accepts execLock but doesn’t use it and creates its own mutex. This is confusing and risks divergence from the pool’s lock intentions. Remove the parameter and update the call site, or plumb and use the passed lock consistently.

-func newDynAllocator(
-	log *zap.Logger,
-	ww *worker_watcher.WorkerWatcher,
-	alloc func() (*worker.Process, error),
-	stopCh chan struct{},
-	execLock *sync.RWMutex,
-	cfg *pool.Config) *dynAllocator {
+func newDynAllocator(
+	log *zap.Logger,
+	ww *worker_watcher.WorkerWatcher,
+	alloc func() (*worker.Process, error),
+	stopCh chan struct{},
+	cfg *pool.Config) *dynAllocator {
 	da := &dynAllocator{
-		mu:          &sync.Mutex{},
+		mu:          &sync.Mutex{},

Also applies to: 48-49


115-124: Stop the ticker with a defer to avoid leaks if early-exiting on stopCh.

Minor hygiene: create the ticker once and defer Stop; you don’t need to Stop inside the select.

-		triggerTTL := time.NewTicker(da.idleTimeout)
+		triggerTTL := time.NewTicker(da.idleTimeout)
+		defer triggerTTL.Stop()
...
-			case <-triggerTTL.C:
-				triggerTTL.Stop()
+			case <-triggerTTL.C:

147-149: Consider atomic.Int64 to avoid two’s-complement decrement trick.

Using Add(^uint64(0)) works but is opaque. Switching to atomic.Int64 and Add(-1) improves readability.

pool/static_pool/pool.go (2)

124-126: Adjust newDynAllocator call after removing unused execLock.

If you remove the execLock parameter as suggested, update the call accordingly.

-		p.dynamicAllocator = newDynAllocator(p.log, p.ww, p.allocator, p.stopCh, &p.mu, p.cfg)
+		p.dynamicAllocator = newDynAllocator(p.log, p.ww, p.allocator, p.stopCh, p.cfg)

426-434: Minor grammar nit in log message.

-				"no free workers in the pool, wait timeout exceed",
+				"no free workers in the pool, wait timeout exceeded",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 413102c and d294c9c.

📒 Files selected for processing (2)
  • pool/static_pool/dyn_allocator.go (7 hunks)
  • pool/static_pool/pool.go (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
pool/static_pool/dyn_allocator.go (1)
worker/worker.go (1)
  • Process (27-59)
pool/static_pool/pool.go (5)
pool/config.go (1)
  • Config (9-37)
pool/command.go (1)
  • Command (8-8)
pool/allocator.go (1)
  • NewPoolAllocator (25-43)
worker_watcher/worker_watcher.go (1)
  • NewSyncWorkerWatcher (41-53)
pool/static_pool/stream.go (1)
  • PExec (5-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
🔇 Additional comments (3)
pool/static_pool/pool.go (2)

384-385: LGTM: direct atomic load for NumDynamic.


136-139: LGTM: comment/doc fix (“Workers returns”).

pool/static_pool/dyn_allocator.go (1)

134-136: Compile-time bug: for range alloc on a number is invalid.

Looping over a uint64 with range does not compile. Use an indexed loop.

-				alloc := da.currAllocated.Load()
-				for range alloc {
+				alloc := da.currAllocated.Load()
+				for i := uint64(0); i < alloc; i++ {

Likely an incorrect or invalid review comment.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
pool/static_pool/pool.go (4)

196-201: Defer in retry loop leaks timer/context; cancel per-iteration.

goto begin can run multiple times while defers run only at return. Move cancel() out of defer to avoid leaks.

Apply this diff:

-ctxGetFree, cancel := context.WithTimeout(ctx, sp.cfg.AllocateTimeout)
-defer cancel()
-w, err := sp.takeWorker(ctxGetFree, op)
+ctxGetFree, cancel := context.WithTimeout(ctx, sp.cfg.AllocateTimeout)
+w, err := sp.takeWorker(ctxGetFree, op)
+cancel()

207-214: Same leak for Exec TTL context; don’t defer when you may goto.

Use explicit cancel after Exec to prevent accumulating timers across retries.

-ctxT, cancelT := context.WithTimeout(ctx, sp.cfg.Supervisor.ExecTTL)
-defer cancelT()
-rsp, err = w.Exec(ctxT, p)
+ctxT, cancelT := context.WithTimeout(ctx, sp.cfg.Supervisor.ExecTTL)
+rsp, err = w.Exec(ctxT, p)
+cancelT()

316-321: Log uses wrong variable (err vs errI).

This makes stream errors hard to diagnose.

-    if errI != nil {
-        sp.log.Warn("stream error", zap.Error(err))
+    if errI != nil {
+        sp.log.Warn("stream error", zap.Error(errI))

336-341: Same logging var issue in non‑supervised stream path.

-    if errI != nil {
-        sp.log.Warn("stream iter error", zap.Error(err))
+    if errI != nil {
+        sp.log.Warn("stream iter error", zap.Error(errI))
pool/static_pool/dyn_allocator.go (2)

110-122: Ticker leak on stopCh path; stop via defer.

triggerTTL is stopped only on TTL; if stopCh fires, ticker keeps running.

 go func() {
-    // DynamicAllocatorOpts are read-only, so we can use them without a lock
-    triggerTTL := time.NewTicker(da.idleTimeout)
+    // DynamicAllocatorOpts are read-only, so we can use them without a lock
+    triggerTTL := time.NewTicker(da.idleTimeout)
+    defer triggerTTL.Stop()
@@
-        case <-triggerTTL.C:
-            triggerTTL.Stop()
+        case <-triggerTTL.C:

Also applies to: 160-161


127-151: Compile error: for range alloc over uint64 is invalid; use counter loop.

Also, don’t block indefinitely on RemoveWorker; set a timeout.

-    alloc := da.currAllocated.Load()
-    for range alloc {
-        // take the worker from the stack, inifinite timeout
-        // we should not block here forever
-        err := da.ww.RemoveWorker(context.Background())
+    alloc := da.currAllocated.Load()
+    for i := uint64(0); i < alloc; i++ {
+        // remove one worker with a bounded timeout
+        ctx, cancel := context.WithTimeout(context.Background(), da.idleTimeout)
+        err := da.ww.RemoveWorker(ctx)
+        cancel()
         if err != nil {
-            da.log.Error("failed to take worker from the stack", zap.Error(err))
+            da.log.Error("failed to remove worker from the pool", zap.Error(err))
             continue
         }
@@
-        // potential problem: if we had an error in the da.ww.Take code block, we'd still have the currAllocated > 0
+        // if RemoveWorker fails, currAllocated may remain > 0; we log below
♻️ Duplicate comments (2)
pool/static_pool/pool.go (1)

136-139: Comment grammar fix retained: LGTM.

pool/static_pool/dyn_allocator.go (1)

85-101: Comment is misleading; spawn loop starts at 0 and may allocate up to spawnRate.

- // we're starting from the 1 because we already allocated one worker which would be released in the Exec function
- // i < da.spawnRate - we can't allocate more workers than the spawn rate
+ // allocate up to spawnRate workers (bounded by maxWorkers)
+ // i < da.spawnRate - don't allocate more than the configured spawn rate
🧹 Nitpick comments (5)
pool/static_pool/pool.go (2)

425-444: Nit: log message grammar.

“wait timeout exceed” → “wait timeout exceeded”.

- "no free workers in the pool, wait timeout exceed",
+ "no free workers in the pool, wait timeout exceeded",

48-49: Unused Pool.mu field.

Looks unused after allocator rework; consider removing to reduce surface area.

pool/static_pool/dyn_allocator.go (3)

103-108: Avoid hard‑coded 2s Take timeout; use configured timeout.

Prefer cfg.AllocateTimeout (plumb it into dynAllocator) for consistency under load.

-ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
+ctx, cancel := context.WithTimeout(context.Background(), da.idleTimeout) // or a dedicated allocateTimeout passed into dynAllocator

37-58: Guard idleTimeout at construction.

time.NewTicker panics on non‑positive durations. Validate and default.

 func newDynAllocator(
@@
-    da := &dynAllocator{
+    da := &dynAllocator{
         maxWorkers:  cfg.DynamicAllocatorOpts.MaxWorkers,
         spawnRate:   cfg.DynamicAllocatorOpts.SpawnRate,
         idleTimeout: cfg.DynamicAllocatorOpts.IdleTimeout,
@@
     }
 
-    da.currAllocated.Store(0)
-    da.started.Store(false)
+    if da.idleTimeout <= 0 {
+        log.Warn("dynamic allocator idle_timeout <= 0; defaulting to 30s")
+        da.idleTimeout = 30 * time.Second
+    }
+    da.currAllocated.Store(0)
+    da.started.Store(false)

26-31: Minor: prefer embedding sync.Mutex over a pointer.

Avoid heap alloc/indirection unless you share the mutex by pointer.

-    mu            *sync.Mutex
+    mu            sync.Mutex

And update uses: da.mu.Lock()da.mu.Lock() (same), &sync.Mutex{} → remove address in ctor.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d294c9c and 07b0571.

📒 Files selected for processing (3)
  • pool/static_pool/dyn_allocator.go (7 hunks)
  • pool/static_pool/dyn_allocator_test.go (1 hunks)
  • pool/static_pool/pool.go (14 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pool/static_pool/dyn_allocator_test.go
🧰 Additional context used
🧬 Code graph analysis (2)
pool/static_pool/pool.go (4)
pool/config.go (1)
  • Config (9-37)
pool/allocator.go (1)
  • NewPoolAllocator (25-43)
worker_watcher/worker_watcher.go (1)
  • NewSyncWorkerWatcher (41-53)
pool/static_pool/stream.go (1)
  • PExec (5-8)
pool/static_pool/dyn_allocator.go (1)
worker/worker.go (1)
  • Process (27-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
🔇 Additional comments (2)
pool/static_pool/pool.go (2)

124-126: Allocator ctor change: confirm no leftover mutex coupling.

newDynAllocator no longer receives a mutex; ensure no call sites still expect shared locking via Pool.mu and that Pool.mu isn’t used elsewhere.


384-385: Atomic read for NumDynamic: LGTM.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
pool/static_pool/dyn_allocator.go (2)

85-86: Comment still references starting from 1 but loop starts at 0.

The past review flagged this, but the comment still says "we're starting from the 1" while the loop initializes i := uint64(0).


139-139: Outdated comment still references Take instead of RemoveWorker.

The comment mentions da.ww.Take but the code uses da.ww.RemoveWorker at line 132.

🧹 Nitpick comments (2)
worker_watcher_v2/header.go (1)

16-19: Consider returning an error instead of panic.

While the validation is correct (bitmask is 64 bits), panicking on invalid input makes the API harder to use safely. Consider returning an error instead.

-func NewHeader(allocWorkers uint) *Header {
+func NewHeader(allocWorkers uint) (*Header, error) {
 	if allocWorkers > 64 {
-		panic("too many workers")
+		return nil, errors.New("allocWorkers cannot exceed 64")
 	}
 
 	return &Header{
 		workers: make([]*worker.Process, allocWorkers),
-	}
+	}, nil
 }
worker_watcher_v2/channel/vec.go (1)

103-108: Update comment after fixing Push synchronization.

The comment on lines 106-108 mentions that "some worker might be reallocated and pushed into the v.workers" during removal. Once the Push method is updated to acquire rwm.RLock() (as suggested in the earlier comment), this race condition will be eliminated and the comment should be updated to reflect the corrected synchronization.

Consider updating the comment to:

-	/*
-		We need to drain vector until we found a worker in the Invalid/Killing/Killed/etc states.
-		BUT while we are draining the vector, some worker might be reallocated and pushed into the v.workers
-		so, down by the code, we might have a problem when pushing the new worker to the v.workers
-	*/
+	// Drain the vector and reclaim workers in good states (Working/Ready).
+	// Workers in bad states are killed. The write lock prevents concurrent Push operations.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07b0571 and 456b9eb.

📒 Files selected for processing (7)
  • pool/static_pool/dyn_allocator.go (6 hunks)
  • pool/static_pool/pool.go (14 hunks)
  • worker_watcher/container/channel/vec.go (1 hunks)
  • worker_watcher/worker_watcher.go (13 hunks)
  • worker_watcher_v2/channel/vec.go (1 hunks)
  • worker_watcher_v2/header.go (1 hunks)
  • worker_watcher_v2/header_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pool/static_pool/pool.go
🧰 Additional context used
🧬 Code graph analysis (5)
worker_watcher/container/channel/vec.go (2)
worker_watcher_v2/channel/vec.go (2)
  • NewVector (24-32)
  • Vec (14-22)
worker/worker.go (1)
  • Process (27-59)
worker_watcher/worker_watcher.go (2)
worker_watcher/container/channel/vec.go (1)
  • NewVector (24-32)
worker_watcher_v2/channel/vec.go (1)
  • NewVector (24-32)
worker_watcher_v2/header.go (1)
worker/worker.go (1)
  • Process (27-59)
worker_watcher_v2/header_test.go (3)
worker_watcher_v2/header.go (1)
  • NewHeader (16-24)
pool/command.go (1)
  • Command (8-8)
worker/worker.go (1)
  • InitBaseWorker (68-128)
worker_watcher_v2/channel/vec.go (1)
fsm/state.go (3)
  • StateWorking (10-10)
  • StateReady (8-8)
  • StateInvalid (12-12)
🪛 GitHub Actions: Linters
pool/static_pool/dyn_allocator.go

[error] 60-60: golangci-lint: const op is unused (unused)

🪛 GitHub Check: Golang-CI (lint)
pool/static_pool/dyn_allocator.go

[failure] 60-60:
const op is unused (unused)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
🔇 Additional comments (6)
worker_watcher/container/channel/vec.go (1)

24-28: LGTM! Dynamic channel capacity improves resource efficiency.

The change from a hardcoded capacity of 600 to a configurable numWorkers parameter aligns with the v2 implementation and allows proper sizing of the worker channel based on actual pool configuration.

worker_watcher/worker_watcher.go (3)

28-28: Good migration to value-based atomic type.

Using atomic.Uint64 instead of pointer-based atomics is cleaner and provides better ergonomics with method calls like Load(), Store(), and Add().


43-54: Clean initialization with atomic state.

The initialization properly creates the container with dynamic capacity and stores the initial worker count atomically.


100-108: Correct atomic decrement pattern.

The Add(^uint64(0)) idiom is the standard Go approach for atomically decrementing an unsigned integer.

pool/static_pool/dyn_allocator.go (1)

26-29: Good refactor to value-based atomics.

The migration from pointer-based atomics to atomic.Uint64 and atomic.Bool improves code clarity and eliminates the previous race conditions identified in past reviews.

worker_watcher_v2/channel/vec.go (1)

40-46: Verify TTL handling strategy when channel is full.

The comment on lines 40-42 mentions that "workers in the v.workers channel can be TTL-ed and killed but presenting in the channel." However, when the channel is full (line 43-45), the code kills the incoming worker w rather than attempting to find and remove TTL-ed workers from the channel.

Consider whether the current strategy is optimal, or if it would be better to drain TTL-ed/dead workers from the channel before killing the new worker. The current approach may result in accumulating dead workers in the channel while rejecting fresh workers.

If this behavior is intentional (e.g., for performance reasons), consider adding a clarifying comment.

@rustatian rustatian marked this pull request as draft December 27, 2025 18:05
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
@rustatian rustatian requested a review from Copilot December 28, 2025 19:12
@rustatian rustatian marked this pull request as ready for review December 28, 2025 19:12
@rustatian
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
pool/static_pool/dyn_allocator.go (1)

99-100: Comment still mentions "starting from 1" but loop starts at 0.

The past review noted this was addressed, but the comment still says "we're starting from the 1". The loop actually initializes i := uint64(0).

🧹 Nitpick comments (9)
Makefile (1)

8-14: Consider adding timeouts to the test target for consistency.

The -timeout 30m additions are appropriate for preventing hanging tests in CI. However, the test target (lines 19-26) doesn't specify timeouts, which creates an inconsistency between local and CI test execution. Consider adding timeouts to the test target as well, perhaps with a shorter duration (e.g., -timeout 15m) if you want faster feedback during local development.

🔎 Proposed addition of timeouts to test target
 test: ## Run application tests
-	go test -v -race ./ipc/pipe
-	go test -v -race ./ipc/socket
-	go test -v -race ./pool/static_pool
-	go test -v -race ./worker
-	go test -v -race ./worker_watcher
-	go test -v -race ./worker_watcher/container/channel
-	go test -v -race ./pool/ratelimiter
+	go test -v -race -timeout 15m ./ipc/pipe
+	go test -v -race -timeout 15m ./ipc/socket
+	go test -v -race -timeout 15m ./pool/static_pool
+	go test -v -race -timeout 15m ./worker
+	go test -v -race -timeout 15m ./worker_watcher
+	go test -v -race -timeout 15m ./worker_watcher/container/channel
+	go test -v -race -timeout 15m ./pool/ratelimiter
 	go test -v -race -fuzz=FuzzStaticPoolEcho -fuzztime=30s -tags=debug ./pool/static_pool
worker_watcher/container/channel/vec_test.go (1)

16-22: Consider cleaning up test workers to avoid resource leaks.

The createTestWorker helper creates worker processes but tests don't clean them up. While php -v exits quickly, leaving workers without explicit cleanup could lead to resource leaks in more complex scenarios.

Suggested improvement
 func createTestWorker(t *testing.T) *worker.Process {
 	t.Helper()
 	cmd := exec.Command("php", "-v")
 	w, err := worker.InitBaseWorker(cmd)
 	require.NoError(t, err)
+	t.Cleanup(func() {
+		_ = w.Stop()
+	})
 	return w
 }
worker_watcher/worker_watcher.go (1)

249-319: Consider extracting shared cleanup logic.

The Reset() and Destroy() methods share nearly identical worker cleanup patterns (Range + wg.Go + State transition + Stop + Callback + Delete). Consider extracting this to a helper method to reduce duplication.

pool/ratelimiter/ratelimiter_test.go (2)

121-125: Avoid accessing internal mutex in tests.

Directly locking rl.mu breaks encapsulation and couples tests to internal implementation. Consider adding a public method like IsAvailable() bool if this state needs to be verified, or rely on behavioral testing via TryAcquire().

Suggested behavioral alternative
-	rl.mu.Lock()
-	if !rl.available {
-		t.Error("rate limiter should be in available state after all operations complete")
-	}
-	rl.mu.Unlock()
+	// After cooldown, the token should be available again
+	if !rl.TryAcquire() {
+		t.Error("rate limiter should be in available state after all operations complete")
+	}

10-26: Consider reducing sleep duration for faster tests.

The 2-second sleep on line 20 could be reduced. Since the cooldown is 1 second, a 1.1-1.2 second sleep would be sufficient and cut test time nearly in half.

pool/ratelimiter/ratelimiter.go (1)

33-39: Multiple Release() calls spawn redundant timers.

If Release() is called multiple times without an intervening TryAcquire(), each call spawns a new time.AfterFunc timer. While the end result is correct (available becomes true), this creates unnecessary timers.

Consider tracking and canceling pending timers if this is a concern in the usage pattern:

Possible enhancement
type RateLimiter struct {
	mu        sync.Mutex
	available bool
	cooldown  time.Duration
	timer     *time.Timer // track pending timer
}

func (rl *RateLimiter) Release() {
	rl.mu.Lock()
	if rl.timer != nil {
		rl.timer.Stop()
	}
	rl.timer = time.AfterFunc(rl.cooldown, func() {
		rl.mu.Lock()
		rl.available = true
		rl.mu.Unlock()
	})
	rl.mu.Unlock()
}
pool/static_pool/dyn_allocator.go (2)

140-145: Potential TOCTOU on lastAllocTry check.

Between da.lastAllocTry.Load() != nil and time.Since(*da.lastAllocTry.Load()), another goroutine could Store a new value. While this is a minor race (worst case: skips one deallocation cycle or uses a slightly stale time), consider loading once:

Suggested fix
-			if da.lastAllocTry.Load() != nil && time.Since(*da.lastAllocTry.Load()) < da.idleTimeout {
+			lastTry := da.lastAllocTry.Load()
+			if lastTry != nil && time.Since(*lastTry) < da.idleTimeout {
 				da.log.Debug("skipping deallocation of dynamic workers, recent allocation detected")
 				continue
 			}

208-210: Consider using a more descriptive name for the pointer helper.

The single-letter function name p is quite terse. A name like ptr or toPtr would be clearer:

-func p[T any](v T) *T {
+func ptr[T any](v T) *T {
 	return &v
 }
pool/static_pool/dyn_allocator_test.go (1)

83-98: Potential goroutine leak if test fails early.

The test spawns 1000 goroutines and adds them to a WaitGroup, but if np.Exec fails or the test panics, goroutines may be orphaned. Consider using t.Cleanup to ensure Destroy is called:

t.Cleanup(func() { np.Destroy(ctx) })
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 456b9eb and 0326513.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • Makefile
  • go.mod
  • pool/config.go
  • pool/ratelimiter/ratelimiter.go
  • pool/ratelimiter/ratelimiter_test.go
  • pool/static_pool/dyn_allocator.go
  • pool/static_pool/dyn_allocator_test.go
  • pool/static_pool/pool.go
  • pool/static_pool/pool_test.go
  • tests/composer.json
  • worker_watcher/container/channel/vec.go
  • worker_watcher/container/channel/vec_test.go
  • worker_watcher/worker_watcher.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • worker_watcher/container/channel/vec.go
  • go.mod
  • pool/static_pool/pool.go
🧰 Additional context used
🧬 Code graph analysis (5)
pool/static_pool/dyn_allocator_test.go (5)
pool/config.go (2)
  • Config (9-37)
  • DynamicAllocationOpts (91-95)
pool/static_pool/pool.go (1)
  • NewPool (52-129)
pool/command.go (1)
  • Command (8-8)
ipc/pipe/pipe.go (1)
  • NewPipeFactory (23-27)
payload/payload.go (1)
  • Payload (9-18)
pool/ratelimiter/ratelimiter_test.go (1)
pool/ratelimiter/ratelimiter.go (1)
  • NewRateLimiter (14-19)
worker_watcher/container/channel/vec_test.go (3)
worker/worker.go (2)
  • Process (27-59)
  • InitBaseWorker (68-128)
pool/command.go (1)
  • Command (8-8)
worker_watcher/container/channel/vec.go (1)
  • NewVector (24-32)
worker_watcher/worker_watcher.go (4)
worker_watcher/container/channel/vec.go (2)
  • Vec (14-22)
  • NewVector (24-32)
worker/worker.go (1)
  • Process (27-59)
internal/protocol.go (1)
  • Pid (59-94)
fsm/state.go (1)
  • StateDestroyed (18-18)
pool/static_pool/dyn_allocator.go (4)
worker_watcher/worker_watcher.go (1)
  • WorkerWatcher (23-39)
worker/worker.go (1)
  • Process (27-59)
pool/ratelimiter/ratelimiter.go (2)
  • RateLimiter (8-12)
  • NewRateLimiter (14-19)
pool/config.go (1)
  • Config (9-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Agent
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
🔇 Additional comments (13)
Makefile (1)

24-25: LGTM!

The new test paths for ./worker_watcher/container/channel and ./pool/ratelimiter properly extend test coverage for the new components introduced in this PR.

tests/composer.json (1)

10-10: Verify compatibility with major version upgrade.

Upgrading spiral/roadrunner-metrics from ^2.0 to ^3.0 is a major version bump. Tests use the metrics API in tests/metrics-issue-571.php (declare, Collector::counter, add). Verify that these RPC calls are compatible with the v3.0 API and ensure RoadRunner server version matches the metrics v3.0 compatibility matrix.

worker_watcher/container/channel/vec_test.go (3)

26-43: LGTM!

Good test coverage for basic push/pop operations with proper context timeout handling.


71-88: LGTM!

Solid edge case coverage for the destroy lifecycle. The assertion order (nil check before error check) is appropriate.


105-122: LGTM!

Good timing verification for context timeout behavior. The assertion elapsed >= 50*time.Millisecond correctly validates that the Pop actually waited.

worker_watcher/worker_watcher.go (3)

23-39: LGTM on the concurrency refactor.

Good migration from raw map with manual locking to sync.Map for worker tracking, and from uint64 to atomic.Uint64 for type-safe atomic operations. The named mutex field mu improves readability.


210-218: Good duplicate worker detection and cleanup.

The LoadAndDelete pattern with logging and killing duplicates is a defensive measure that handles edge cases where the same PID might be reused. The warning log helps with debugging such scenarios.


270-284: The project is correctly configured for Go 1.25, which is when sync.WaitGroup.Go was introduced. No action needed.

pool/static_pool/pool_test.go (1)

1-1: LGTM!

Good cleanup removing the nolint:stylecheck directive.

pool/ratelimiter/ratelimiter.go (1)

21-31: LGTM!

Clean mutex-protected check-and-set pattern for token acquisition.

pool/static_pool/dyn_allocator.go (2)

64-76: LGTM on rate limiting integration.

Good use of the rate limiter to prevent rapid-fire allocation attempts. The early return with warning log when rate-limited is appropriate behavior.


195-202: Good fix for the race condition.

Setting started=false before releasing the lock ensures addMoreWorkers() sees the correct state when it acquires the lock. The comment documents this well.

pool/static_pool/dyn_allocator_test.go (1)

116-214: LGTM on Test_DynamicPool_OverMax.

Good integration test covering the max worker limit, dynamic allocation triggering, and eventual deallocation. The timing assertions and worker count checks validate the expected behavior.

rustatian and others added 2 commits December 28, 2025 20:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pool/static_pool/dyn_allocator.go (2)

99-101: Comment still misleading about loop start.

The comment says "we're starting from the 1" but the loop starts at i := uint64(0). A past review flagged this and it was marked as addressed, but the comment text still contains the misleading statement.

Proposed fix
-	// we're starting from the 1 because we already allocated one worker which would be released in the Exec function
+	// allocate up to spawnRate additional workers when there are no free workers
 	// i < da.spawnRate - we can't allocate more workers than the spawn rate
 	for i := uint64(0); i < da.spawnRate; i++ {

195-198: Outdated comment: function name mismatch.

Line 196 references allocateDynamically() but the function is now named addMoreWorkers(). A past review flagged similar naming inconsistencies.

Proposed fix
-				// CRITICAL FIX: Set started=false BEFORE releasing the lock
-				// This ensures that any allocateDynamically() call that acquires the lock
+				// CRITICAL FIX: Set started=false BEFORE releasing the lock
+				// This ensures that any addMoreWorkers() call that acquires the lock
 				// after this point will see started=false and start a new listener
🧹 Nitpick comments (3)
pool/static_pool/dyn_allocator.go (3)

1-4: Outdated comment: function name mismatch.

Line 3 references dynamicTTLListener but the function is now named startIdleTTLListener. This was partially addressed from a past review (which mentioned allocateDynamicallyaddMoreWorkers), but the TTL listener name also needs updating.

Proposed fix
 // Dynamic allocator for the static pool implementation
 // It allocates new workers with batch spawn rate when there are no free workers
-// It uses 2 functions: addMoreWorkers to allocate new workers and dynamicTTLListener
+// It uses 2 functions: addMoreWorkers to allocate new workers and startIdleTTLListener
 package static_pool

58-59: Consider removing redundant zero-value initialization.

atomic.Uint64 and atomic.Bool already initialize to 0 and false respectively. These explicit stores are harmless but unnecessary.

Proposed fix
 	}

-	da.currAllocated.Store(0)
-	da.started.Store(false)
-
 	return da

208-210: Consider a more descriptive name for the pointer helper.

The function p works correctly but its single-letter name is cryptic. Consider ptr or toPtr for clarity, especially if the file grows.

Proposed fix
-func p[T any](v T) *T {
+func ptr[T any](v T) *T {
 	return &v
 }

Also update the call site on line 67:

-	da.lastAllocTry.Store(p(time.Now().UTC()))
+	da.lastAllocTry.Store(ptr(time.Now().UTC()))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0326513 and 7160d7d.

📒 Files selected for processing (1)
  • pool/static_pool/dyn_allocator.go
🧰 Additional context used
🧬 Code graph analysis (1)
pool/static_pool/dyn_allocator.go (1)
pool/ratelimiter/ratelimiter.go (2)
  • RateLimiter (8-12)
  • NewRateLimiter (14-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS windows-latest)
  • GitHub Check: Build (Go stable, PHP 8.4, OS ubuntu-latest)
🔇 Additional comments (3)
pool/static_pool/dyn_allocator.go (3)

19-38: LGTM!

The struct design is well-organized. The switch from atomic.Pointer[uint64] to atomic.Uint64 and atomic.Pointer[bool] to atomic.Bool is a good simplification. The rate limiter and lastAllocTry fields are properly documented.


107-116: LGTM!

The allocation loop correctly handles errors by logging and continuing with remaining workers. The atomic Add(1) is the correct pattern (addressing the past race condition concern).


169-186: LGTM!

The deallocation loop correctly:

  • Uses a bounded timeout (500ms) to prevent indefinite blocking
  • Stops deallocation attempts on first error (reasonable given workers are busy)
  • Uses atomic decrement only on successful removal
  • Properly logs operations

Consider extracting the 500ms timeout to a constant if it needs tuning later, but this is optional.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
@rustatian rustatian merged commit 98835b0 into master Dec 28, 2025
11 checks passed
@rustatian rustatian deleted the fix/correctly-deallocate branch December 28, 2025 20:41
@coderabbitai coderabbitai bot mentioned this pull request Mar 7, 2026
6 tasks
@coderabbitai coderabbitai bot mentioned this pull request Mar 20, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants