Skip to content

Nexus Standalone Operations [WiP]#9869

Draft
stephanos wants to merge 25 commits intomainfrom
feature/nexus-standalone
Draft

Nexus Standalone Operations [WiP]#9869
stephanos wants to merge 25 commits intomainfrom
feature/nexus-standalone

Conversation

@stephanos
Copy link
Copy Markdown
Contributor

What changed?

Describe what has changed in this PR.

Why?

Tell your future self why have you made these changes.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

Any change is risky. Identify all risks you are aware of. If none, remove this section.

@stephanos stephanos changed the title Nexus Standalone Operations Nexus Standalone Operations [WiP] Apr 8, 2026
@stephanos stephanos force-pushed the feature/nexus-standalone branch from 6a0bd6e to 88281af Compare April 8, 2026 19:17
gow and others added 21 commits April 8, 2026 22:28
- Added workflow command handler registry to CHASM's workflow library.
- Integrated CHASM's workflow library into workflow completion handler.

Migrating Nexus from HSM to CHASM.

Tests will be ported over once actual command handler implementations
are added.
 - Added `OperationState` proto fields
 - Migrated nexus operation state transitions.

Migrating nexus from HSM to CHASM

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

N/A. This code path is currently unreachable.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches persisted proto schemas and core state-transition logic for
retries/timeouts; while executors are still unimplemented, any
activation of this path could impact task scheduling and retry
semantics.
>
> **Overview**
> Migrates Nexus operation lifecycle handling to CHASM by implementing
the operation state machine transitions to emit invocation,
backoff-retry, and timeout tasks and to record attempt metadata (last
failure/completion time, next retry time, operation token).
>
> Expands the `OperationState` and task protos to persist
endpoint/operation identifiers, scheduling timestamps, retry/attempt
fields, and separate timeout task types (`schedule-to-start`,
`start-to-close`, `schedule-to-close`), and wires new timeout task
executors through Fx and the library task registry. Adds unit tests
covering the new transition behavior and task scheduling.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4ab5fd0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Ported command handler for Nexus "schedule" command from HSM to CHASM.

CHASM migration.

- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Chetan Gowda <chetan.gowda@temporal.io>
Co-authored-by: Chetan Gowda <gow@users.noreply.github.com>
Co-authored-by: Shivam <57200924+Shivs11@users.noreply.github.com>
Ported command handler for Nexus "cancel" command from HSM to CHASM.

CHASM migration.

- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Roey Berman <roey.berman@gmail.com>
## What changed?

Replaces workflow-specific fields (ie `scheduled_event_token` and
`requested_event_id`) in the CHASM Nexus operation state proto with a
generic field.

## Why?

Ensure CHASM Nexus operation state has no workflow-specifics.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…flowregistry (#9474)

## What changed?
Just a package rename. No other code change.

## Why?
Migrating Nexus from HSM to CHASM

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Primarily a package/API rename and dependency wiring update; low
behavioral risk, but broad mechanical changes could cause compile-time
breakage if any call sites were missed.
> 
> **Overview**
> Renames the CHASM workflow command registry package from
`chasm/lib/workflow/command` to `chasm/lib/workflow/workflowregistry`
and updates its public API (`RegisterCommandHandler`, `CommandHandler`,
`CommandHandlerOptions`, `ErrCommandNotSupported`).
> 
> Propagates the rename through Nexus operation command handlers/tests,
History `RespondWorkflowTaskCompleted` CHASM fallback path, engine/fx
wiring, and related Nexus components so CHASM command handling continues
to resolve and invoke handlers via the new registry.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e4740af. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
Migrating Nexus history event Registry and Definition. I've also moved
all the event implementations as well with commented out bodies. I will
replace the implementations of `Apply()` and `CherryPick()` in follow
PRs.
Depends on #9474

## Why?
Migrating Nexus from HSM to CHASM.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces new history event registration and lookup paths for Nexus
operations; since the `Apply` implementations are currently stubs,
there’s some risk of silently skipping state transitions during
replication/reset until follow-up PRs complete the logic.
> 
> **Overview**
> Adds first-class **history event definitions** to
`workflowregistry.Registry` via a new `EventDefinition` interface (with
`Apply` and `CherryPick`) and
`RegisterEventDefinition`/`EventDefinition` APIs.
> 
> Wires Nexus operation event registration into the `fx` module and
introduces `events.go` with definitions for Nexus lifecycle events
(scheduled/cancel/start/complete/fail/cancel/timeout), including basic
workflow-task-trigger flags and `CherryPick` exclusion handling (notably
`RESET_REAPPLY_EXCLUDE_TYPE_NEXUS`), while leaving `Apply` bodies as
TODO stubs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e4bd20b. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
This PR adds methods in workflow component to handle nexus events.

## Why?
Migrating Nexus from HSM to CHASM

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches the workflow-task completion path and CHASM workflow wiring
(registry injection and history-event application), so misregistration
or missing context could cause runtime command failures despite largely
being additive/refactor changes.
> 
> **Overview**
> Adds CHASM `Workflow` helpers to *emit and apply* Nexus operation
lifecycle events (started/completed/failed/canceled/timed-out),
including consistent failure wrapping via
`NexusOperationExecutionFailure`.
> 
> Refactors CHASM workflow command/event registration by moving
`workflowregistry` into `chasm/lib/workflow` as `Registry`, injecting it
into CHASM context, and updating Nexus workflow command handlers to use
`AddAndApplyHistoryEvent` so command-emitted history events immediately
run their registered event definitions. Updates wiring/tests/callers
across services to construct `NewLibrary(NewRegistry())` and use the new
types/errors.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
70936c6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
Migrating all the event definition's `Apply()` method from HSM to CHASM.
Also migrated unit tests.

## Why?
HSM to CHASM migration

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes Nexus operation lifecycle handling by moving
scheduling/cancellation and terminal event application into CHASM event
definitions and adjusting state transition semantics; regressions could
affect operation task emission, cancellation timing, and cleanup during
replay/reset.
> 
> **Overview**
> Implements CHASM-based Nexus operation event `Apply()` handlers:
scheduled/cancel-requested/started now create or update the in-memory
operation component (including spawning/scheduling a cancellation child
once an operation token exists), and terminal events
(completed/failed/canceled/timed-out) transition the operation then
remove it from the workflow.
> 
> Refactors workflow Nexus operation storage to be keyed by
`ScheduledEventId` (`int64`), simplifies command handlers to only emit
history events (letting event definitions create/update components), and
expands cancellation/operation state machines with concrete task
emission and retry/backoff metadata. Updates `chasm.Transition.Apply` to
run transition logic before mutating state (enabling source-state
inspection) and adds new CHASM-focused unit tests for the migrated event
definitions and updated transition behavior.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c0f4e55. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
#9790)

## What changed?
Minor fixes.

## Why?
These were missing or misplaced previously.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes task routing and cancellation scheduling in the Nexus
operation state machine, which can affect execution flow and delivery to
outbound queues. While localized, incorrect destinations or transition
sequencing could cause stuck or misrouted operations.
> 
> **Overview**
> Ensures Nexus invocation tasks are routed to the correct outbound
queue by setting `TaskAttributes.Destination` to the operation endpoint
when scheduling and when rescheduling after backoff.
> 
> Moves the "cancellation already requested" handling from the workflow
`Started` event handler into `TransitionStarted`, so pending
cancellations are automatically scheduled as soon as an operation token
becomes available.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4eddec6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?

- [[Nexus-Chasm] Improve nexus operation dynamic
config](59a7068)
- [[Nexus-Chasm] Improve operation state machine and task
handlers](c6f1fa8)
- [[Nexus-Chasm] Refactor workflow registry and nexus workflow
integration](45d0cfc)

## Why?

- Cleanup
- Improved test coverage
- Bug fixes
Add API boilerplate for standalone Nexus Operations.

- [x] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
## What changed?

Add Nexus Standalone feature flag.

## How did you test it?

Tests will be added to respective API impl.
Add Nexus Standalone Describe and Start handlers.

- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [x] added new functional test(s)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Nexus Standalone List and Count handlers.

- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [x] added new functional test(s)
@stephanos stephanos force-pushed the feature/nexus-standalone branch 3 times, most recently from 537767c to d294049 Compare April 9, 2026 15:59
stephanos and others added 4 commits April 9, 2026 09:35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Picks up regenerated api-go from merged api proto (master into
standalone-nexus-op), which adds ComputeConfigSummary and other
new types needed for the build to pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stephanos stephanos force-pushed the feature/nexus-standalone branch from d294049 to 9d806ef Compare April 9, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants