Skip to content

Comments

💥 Use Temporal Failures for Nexus Error Serialization#2773

Open
Quinn-With-Two-Ns wants to merge 7 commits intotemporalio:masterfrom
Quinn-With-Two-Ns:NEXUS-38
Open

💥 Use Temporal Failures for Nexus Error Serialization#2773
Quinn-With-Two-Ns wants to merge 7 commits intotemporalio:masterfrom
Quinn-With-Two-Ns:NEXUS-38

Conversation

@Quinn-With-Two-Ns
Copy link
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns commented Feb 5, 2026

💥 Use Temporal Failures for Nexus Error Serialization. The Java SDK now responds to Nexus operation failures and task failures with plain Temporal Failures. This change also allows OperationException and HandlerException to have their own message and stacktrace independent of their cause. If the Server does not support the new format the SDK converts back to the old format before sending the response.

Requires:


Note

Medium Risk
Touches core Nexus failure serialization/deserialization and wire compatibility logic; regressions could alter error propagation or metrics for Nexus operations across mixed server versions.

Overview
Nexus error reporting is switched to use plain Temporal Failure protos for both operation failures and task failures, enabling independent messages/stack traces for OperationException/HandlerException and improving round-trip fidelity.

The worker now negotiates format via request capabilities (with a test-only temporal.nexus.forceOldFailureFormat override) and converts back to the legacy HandlerError/UnsuccessfulOperationError shape when needed; the in-memory test server is updated to accept both formats and advertise support. This also updates failure conversion utilities (NexusUtil, DefaultFailureConverter) and expands tests to cover new/old formats, cancellation vs failure, message-only handler errors, and stack-trace preservation; CI/dev deps are bumped (TEMPORAL_CLI_VERSION, Nexus snapshot, mavenLocal()).

Written by Cursor Bugbot for commit 3d44cf5. This will update automatically on new commits. Configure here.

@Quinn-With-Two-Ns Quinn-With-Two-Ns requested a review from a team as a code owner February 5, 2026 22:21
@Quinn-With-Two-Ns Quinn-With-Two-Ns marked this pull request as draft February 5, 2026 22:22
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b2a0c78ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +412 to +415
UnsuccessfulOperationError.newBuilder()
.setOperationState(operationState)
.setFailure(
NexusUtil.temporalFailureToNexusFailure(failure.getCause()))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve top-level failure when downgrading responses

When the server doesn’t support temporal failure responses, the downgrade path builds an UnsuccessfulOperationError but converts failure.getCause() instead of the actual failure. For typical ApplicationFailure/CanceledFailure without a cause, this sends an empty Nexus failure (losing the message/stack), so old servers receive a blank error. Consider converting the top-level failure (or falling back to it when getCause() is unset) to preserve the operation failure details.

Useful? React with 👍 / 👎.


allprojects {
repositories {
mavenLocal()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally committed local Maven repository configuration

High Severity

mavenLocal() has been added to allprojects.repositories and takes priority over mavenCentral(). This is typically used during local development to resolve SNAPSHOT dependencies (in this case nexusVersion = '0.5.0-SNAPSHOT'). Having mavenLocal() in the main build file causes non-reproducible builds since different machines may have different artifacts in their local Maven cache. CI builds may fail or produce inconsistent results.

Fix in Cursor Fix in Web

}
return Failure.newBuilder()
.setMessage(failure.getMessage())
.setMessage(temporalFailure.getMessage())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated failure serialization logic in NexusUtil

Low Severity

temporalFailureToNexusFailure and temporalFailureToNexusFailureInfo share nearly identical logic for serializing a Temporal failure to JSON. Both call PROTO_JSON_PRINTER.print() with the same arguments, handle InvalidProtocolBufferException identically, and set the same metadata. Only the return type differs (Failure vs FailureInfo). The JSON serialization logic could be extracted to a shared helper.

Fix in Cursor Fix in Web

@Quinn-With-Two-Ns Quinn-With-Two-Ns marked this pull request as ready for review February 19, 2026 00:27
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 349588549b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

grpcVersion = '1.75.0' // [1.38.0,) Needed for io.grpc.protobuf.services.HealthStatusManager
jacksonVersion = '2.15.4' // [2.9.0,)
nexusVersion = '0.4.0-alpha'
nexusVersion = '0.5.0-SNAPSHOT'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a published nexus-sdk version

nexusVersion is set to 0.5.0-SNAPSHOT, but the only configured repositories are mavenLocal() and mavenCentral(). In a clean environment (including CI runners with an empty local Maven cache), this snapshot cannot be resolved from Maven Central, so Gradle fails before tests/build can start; this effectively makes the project non-buildable unless contributors manually publish the dependency to local Maven first.

Useful? React with 👍 / 👎.

Comment on lines 75 to 77
TEMPORAL_CLI_VERSION: v1.6.1-server-1.31.0-151.0
run: |
wget -O temporal_cli.tar.gz https://github.com/temporalio/cli/releases/download/v${TEMPORAL_CLI_VERSION}/temporal_cli_${TEMPORAL_CLI_VERSION}_linux_amd64.tar.gz

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove duplicated v in Temporal CLI download version

TEMPORAL_CLI_VERSION now includes a leading v, but the download URL template already prepends v via download/v${TEMPORAL_CLI_VERSION}. This expands to download/vv..., so the server bootstrap step will fetch a non-existent release artifact and fail the CI job before tests run.

Useful? React with 👍 / 👎.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

UnsuccessfulOperationError.newBuilder()
.setOperationState(operationState)
.setFailure(
NexusUtil.temporalFailureToNexusFailure(failure.getCause()))))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old format fallback loses error info for causeless OperationException

High Severity

When falling back to old format for servers that don't support temporalFailureResponses, failure.getCause() is used to build the Nexus Failure inside UnsuccessfulOperationError. Here failure is a protobuf io.temporal.api.failure.v1.Failure, and getCause() returns its nested protobuf cause field. When OperationException has no Java cause (e.g. OperationException.failure("message") or OperationException.canceled("message")), the protobuf cause field is unset, so getCause() returns a default empty Failure. This causes temporalFailureToNexusFailure to produce a Nexus Failure with an empty message, silently losing all error information in the old-format response.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant