💥 Use Temporal Failures for Nexus Error Serialization#2773
💥 Use Temporal Failures for Nexus Error Serialization#2773Quinn-With-Two-Ns wants to merge 7 commits intotemporalio:masterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b2a0c78ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| UnsuccessfulOperationError.newBuilder() | ||
| .setOperationState(operationState) | ||
| .setFailure( | ||
| NexusUtil.temporalFailureToNexusFailure(failure.getCause())))) |
There was a problem hiding this comment.
Preserve top-level failure when downgrading responses
When the server doesn’t support temporal failure responses, the downgrade path builds an UnsuccessfulOperationError but converts failure.getCause() instead of the actual failure. For typical ApplicationFailure/CanceledFailure without a cause, this sends an empty Nexus failure (losing the message/stack), so old servers receive a blank error. Consider converting the top-level failure (or falling back to it when getCause() is unset) to preserve the operation failure details.
Useful? React with 👍 / 👎.
|
|
||
| allprojects { | ||
| repositories { | ||
| mavenLocal() |
There was a problem hiding this comment.
Accidentally committed local Maven repository configuration
High Severity
mavenLocal() has been added to allprojects.repositories and takes priority over mavenCentral(). This is typically used during local development to resolve SNAPSHOT dependencies (in this case nexusVersion = '0.5.0-SNAPSHOT'). Having mavenLocal() in the main build file causes non-reproducible builds since different machines may have different artifacts in their local Maven cache. CI builds may fail or produce inconsistent results.
| } | ||
| return Failure.newBuilder() | ||
| .setMessage(failure.getMessage()) | ||
| .setMessage(temporalFailure.getMessage()) |
There was a problem hiding this comment.
Duplicated failure serialization logic in NexusUtil
Low Severity
temporalFailureToNexusFailure and temporalFailureToNexusFailureInfo share nearly identical logic for serializing a Temporal failure to JSON. Both call PROTO_JSON_PRINTER.print() with the same arguments, handle InvalidProtocolBufferException identically, and set the same metadata. Only the return type differs (Failure vs FailureInfo). The JSON serialization logic could be extracted to a shared helper.
d69b39f to
3495885
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 349588549b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| grpcVersion = '1.75.0' // [1.38.0,) Needed for io.grpc.protobuf.services.HealthStatusManager | ||
| jacksonVersion = '2.15.4' // [2.9.0,) | ||
| nexusVersion = '0.4.0-alpha' | ||
| nexusVersion = '0.5.0-SNAPSHOT' |
There was a problem hiding this comment.
Use a published nexus-sdk version
nexusVersion is set to 0.5.0-SNAPSHOT, but the only configured repositories are mavenLocal() and mavenCentral(). In a clean environment (including CI runners with an empty local Maven cache), this snapshot cannot be resolved from Maven Central, so Gradle fails before tests/build can start; this effectively makes the project non-buildable unless contributors manually publish the dependency to local Maven first.
Useful? React with 👍 / 👎.
.github/workflows/ci.yml
Outdated
| TEMPORAL_CLI_VERSION: v1.6.1-server-1.31.0-151.0 | ||
| run: | | ||
| wget -O temporal_cli.tar.gz https://github.com/temporalio/cli/releases/download/v${TEMPORAL_CLI_VERSION}/temporal_cli_${TEMPORAL_CLI_VERSION}_linux_amd64.tar.gz |
There was a problem hiding this comment.
Remove duplicated
v in Temporal CLI download version
TEMPORAL_CLI_VERSION now includes a leading v, but the download URL template already prepends v via download/v${TEMPORAL_CLI_VERSION}. This expands to download/vv..., so the server bootstrap step will fetch a non-existent release artifact and fail the CI job before tests run.
Useful? React with 👍 / 👎.
| UnsuccessfulOperationError.newBuilder() | ||
| .setOperationState(operationState) | ||
| .setFailure( | ||
| NexusUtil.temporalFailureToNexusFailure(failure.getCause())))) |
There was a problem hiding this comment.
Old format fallback loses error info for causeless OperationException
High Severity
When falling back to old format for servers that don't support temporalFailureResponses, failure.getCause() is used to build the Nexus Failure inside UnsuccessfulOperationError. Here failure is a protobuf io.temporal.api.failure.v1.Failure, and getCause() returns its nested protobuf cause field. When OperationException has no Java cause (e.g. OperationException.failure("message") or OperationException.canceled("message")), the protobuf cause field is unset, so getCause() returns a default empty Failure. This causes temporalFailureToNexusFailure to produce a Nexus Failure with an empty message, silently losing all error information in the old-format response.


💥 Use Temporal Failures for Nexus Error Serialization. The Java SDK now responds to Nexus operation failures and task failures with plain Temporal Failures. This change also allows
OperationExceptionandHandlerExceptionto have their own message and stacktrace independent of their cause. If the Server does not support the new format the SDK converts back to the old format before sending the response.Requires:
Note
Medium Risk
Touches core Nexus failure serialization/deserialization and wire compatibility logic; regressions could alter error propagation or metrics for Nexus operations across mixed server versions.
Overview
Nexus error reporting is switched to use plain Temporal
Failureprotos for both operation failures and task failures, enabling independent messages/stack traces forOperationException/HandlerExceptionand improving round-trip fidelity.The worker now negotiates format via request capabilities (with a test-only
temporal.nexus.forceOldFailureFormatoverride) and converts back to the legacyHandlerError/UnsuccessfulOperationErrorshape when needed; the in-memory test server is updated to accept both formats and advertise support. This also updates failure conversion utilities (NexusUtil,DefaultFailureConverter) and expands tests to cover new/old formats, cancellation vs failure, message-only handler errors, and stack-trace preservation; CI/dev deps are bumped (TEMPORAL_CLI_VERSION, Nexus snapshot,mavenLocal()).Written by Cursor Bugbot for commit 3d44cf5. This will update automatically on new commits. Configure here.