Skip to content

Add CRR Cascaded capabilities#6179

Open
SylvainSenechal wants to merge 1 commit into
development/9.4from
improvement/CLDSRV-897
Open

Add CRR Cascaded capabilities#6179
SylvainSenechal wants to merge 1 commit into
development/9.4from
improvement/CLDSRV-897

Conversation

@SylvainSenechal

@SylvainSenechal SylvainSenechal commented May 29, 2026

Copy link
Copy Markdown
Contributor

ISSUE : CLDSRV-897

Crr cascaded design : https://github.com/scality/citadel/pull/349

Related PRs :
Arsenal : scality/Arsenal#2628
CloudserverClient : scality/cloudserverclient#24
Backbeat : scality/backbeat#2747
S3utils : scality/s3utils#395

@bert-e

bert-e commented May 29, 2026

Copy link
Copy Markdown
Contributor

Hello sylvainsenechal,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

Comment thread lib/api/apiUtils/object/versioning.js Outdated
Comment thread lib/routes/routeBackbeat.js Outdated
Comment thread lib/routes/routeBackbeat.js
Comment thread package.json Outdated
"@hapi/joi": "^17.1.1",
"@smithy/node-http-handler": "^3.0.0",
"arsenal": "git+https://github.com/scality/Arsenal#8.4.2",
"arsenal": "file:../Arsenal",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arsenal and @scality/cloudserverclient are pointing to local filesystem paths (file:../Arsenal, file:../cloudserverclient) instead of pinned git tags. This will break installs for anyone who does not have these directories locally, and CI will fail to resolve them.

Suggested change
"arsenal": "file:../Arsenal",
"arsenal": "git+https://github.com/scality/Arsenal#8.4.2",

— Claude Code

Comment thread package.json Outdated
"devDependencies": {
"@eslint/compat": "^1.2.2",
"@scality/cloudserverclient": "1.0.7",
"@scality/cloudserverclient": "file:../cloudserverclient",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue: file:../cloudserverclient should be pinned to a published version or git tag.

— Claude Code

Comment thread tests/functional/backbeat/crrCascade.js Outdated
});

describe('CRR cascade — putMetadata', () => {
it('second write with the same microVersionId returns loop-detected', async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test names using it() should start with should per project conventions.

— Claude Code

}],
},
}));
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing after() hook to clean up the test buckets (TEST_BUCKET, TEST_BUCKET_CRR, DEST_BUCKET). Without cleanup, leftover buckets accumulate across test runs and can cause name collisions or resource leaks.

— Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bucket/keys names are randomized
I intentionnally didnt add any after to keep the test and file lighter.

Do we want to add after cleanup ?

@claude

claude Bot commented May 29, 2026

Copy link
Copy Markdown
  • Dependency pinning: arsenal and @scality/cloudserverclient in package.json use file: paths pointing to local directories. These must be pinned to git tags or published versions before merge.
    • arsenal on line 36: file:../Arsenal → should be a git tag (e.g. 8.4.2)
    • @scality/cloudserverclient on line 68: file:../cloudserverclient → should be a versioned reference
  • Test naming: it() descriptions in tests/functional/backbeat/crrCascade.js should start with should per project conventions.
  • Test cleanup: No after() hook to delete the three buckets created in before(). Leftover buckets will accumulate across test runs.

Review by Claude Code

Comment thread lib/routes/routeBackbeat.js
Comment thread lib/routes/routeBackbeat.js Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these worked locally, need to wait for the bumps now

Comment thread package.json Outdated
"@hapi/joi": "^17.1.1",
"@smithy/node-http-handler": "^3.0.0",
"arsenal": "git+https://github.com/scality/Arsenal#8.4.2",
"arsenal": "file:../Arsenal",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO 🧐

@SylvainSenechal SylvainSenechal changed the title Improvement/cldsrv 897 Add CRR Cascaded capabilities May 29, 2026
@SylvainSenechal SylvainSenechal requested review from a team, delthas and maeldonn May 29, 2026 18:11
@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch from c15fb2b to 7912480 Compare May 29, 2026 18:38
@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
9450 2 9448 0
View the full list of 2 ❄️ flaky test(s)
"after each" hook for "should fail if trying to overwrite a delete marker"::MPU with x-scal-s3-version-id header With default signature "after each" hook for "should fail if trying to overwrite a delete marker"

Flake rate in main: 100.00% (Passed 0 times, Failed 98 times)

Stack Traces | 0.012s run time
We encountered an internal error. Please try again.
"after each" hook for "should fail if trying to overwrite a delete marker"::MPU with x-scal-s3-version-id header With v4 signature "after each" hook for "should fail if trying to overwrite a delete marker"

Flake rate in main: 100.00% (Passed 0 times, Failed 102 times)

Stack Traces | 0.014s run time
We encountered an internal error. Please try again.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Comment thread tests/functional/backbeat/crrCascade.js Outdated
});

describe('CRR cascade — putMetadata', () => {
it('second write with the same microVersionId returns loop-detected', async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test names should start with "should" per project conventions (e.g. should return loop-detected on second write with the same microVersionId).

— Claude Code

Comment thread package.json Outdated
"@hapi/joi": "^17.1.1",
"@smithy/node-http-handler": "^3.0.0",
"arsenal": "git+https://github.com/scality/Arsenal#8.4.2",
"arsenal": "file:../Arsenal",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arsenal and @scality/cloudserverclient use file: local path references. These must be pinned to a tag or version before merging — file: references will break any install outside of your local workstation.

— Claude Code

@claude

claude Bot commented May 29, 2026

Copy link
Copy Markdown
  • package.json: arsenal and @scality/cloudserverclient use file: local path references instead of pinned tags/versions — these will break installs for anyone else and must be updated before merge
    - Pin arsenal to the Arsenal PR tag once merged (was 8.4.2)
    - Pin @scality/cloudserverclient to a published version (was 1.0.7)
    - tests/functional/backbeat/crrCascade.js: test names in it() blocks should start with should

    Review by Claude Code

Comment thread lib/routes/routeBackbeat.js Outdated
}

const encodedMicroVersionId = request.headers['x-scal-micro-version-id'];
const incomingRaw = encodedMicroVersionId ? decodeMicroVersionId(encodedMicroVersionId) : null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decode error handling here differs from putMetadata (line 574-575), which normalises the result immediately so incomingRaw is guaranteed non-Error. Here incomingRaw can hold an Error object, then is checked later with !(incomingRaw instanceof Error). Consider aligning with the putMetadata pattern for consistency.

— Claude Code

Comment thread tests/functional/backbeat/crrCascade.js
Comment thread package.json Outdated
"@hapi/joi": "^17.1.1",
"@smithy/node-http-handler": "^3.0.0",
"arsenal": "git+https://github.com/scality/Arsenal#8.4.2",
"arsenal": "file:../Arsenal",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arsenal and @scality/cloudserverclient point to local file:../ paths. These must be pinned to git tags before merging — anyone cloning this repo won't have these sibling directories.

— Claude Code

Comment thread lib/utilities/collectResponseHeaders.js Outdated
@claude

claude Bot commented Jun 1, 2026

Copy link
Copy Markdown
  • package.json: arsenal and @scality/cloudserverclient point to local file:../ paths — must be pinned to git tags before merging.
    - lib/utilities/collectResponseHeaders.js:103: Cascade replicas without a next hop have status: '' and isReplica: true, but the outer if only checks status (falsy for empty string), so x-amz-replication-status: REPLICA header will be missing from S3 responses for these objects.
    - lib/routes/routeBackbeat.js:438: putData and putMetadata handle decodeMicroVersionId error cases differently — consider aligning for consistency.
    - tests/functional/backbeat/crrCascade.js: Missing after() hook to clean up the three test buckets created in before().

    Review by Claude Code

Comment thread package.json Outdated
Comment thread lib/routes/routeBackbeat.js Outdated
Comment thread tests/functional/backbeat/crrCascade.js Outdated
@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown
  • package.json:40 — Arsenal is pinned to a commit hash instead of a tag. Git-based deps should pin to a tag per project conventions.
    - cloudserverclient.tgz — This file is added but never referenced anywhere (package.json and Dockerfile only reference scality-cloudserverclient-v1.0.9.tgz). Remove the orphan file.
    - lib/routes/routeBackbeat.js:638 — Inconsistent isReplica check uses truthy (isReplica) while lines 654, 671, and 750 use strict equality (isReplica === true). Use strict equality for consistency.
    - tests/functional/backbeat/crrCascade.js:228output variable assigned but never used.

    Review by Claude Code

@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch from 365213b to 37068a8 Compare June 22, 2026 09:57
Comment thread scripts/crr-cascade-test.ts Fixed
Comment thread scripts/crr-cascade-test.ts Fixed
@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch 2 times, most recently from 3e4bd1d to 9840ffa Compare June 22, 2026 11:49
@SylvainSenechal SylvainSenechal requested a review from a team June 22, 2026 13:54
@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch 2 times, most recently from 391c544 to 3c698dc Compare June 23, 2026 17:54
Comment thread lib/routes/routeBackbeat.js Outdated
// as cascade targets because they use the MultiBackend S3 path which
// bypasses the putData/putMetadata routes, so loop detection cannot fire
// on those destinations.
const BLOCKED_LOCATION_TYPES = ['location-scality-ring-s3-v1', 'location-scality-artesca-s3-v1'];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be tested on a real lab ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be honestly, i think i need to double check this together with the design and the code

replicationInfo: getReplicationInfo(config,
objectKey, bucketMD, false, size, null, null, authInfo),
overheadField,
updateMicroVersionId: true,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, actually this is important, I added this after making my tests in codespaces :
Without this :
You put an object : object isn't assigned an micro version Id
You setup the whole replication, backbeat picks up that source object, it doesn't have any microversion id so cloudserver just doesn't try to process it as a potential cascade replication

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should "new" objects not keep having an empty microVersionId (to avoid wasting metadata space)?
    and just consider "no micro version id" is the oldest micro versionId (the only thing it cannot compare to is another no micro version id - but that would not happen on new object, each get their own versionId)
  • in case of un-versioned buckets (i.e. only master objects), we may need to update the micro version id (because we have no version id) ; however there is no replication in that case, so no problem not to do it

@maeldonn maeldonn Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a bit with Maël.
I think we could try to not do it all the time, and do it only for putObject (but not even sure).
But we were wondering what happens if we don't write microVersionId on putObject : how do we compare two objects that don't have any micro version ID, it may be possible on the first replication to consider no micro version id => source is newer, but then we must write the microVersionID anyways to block loop and make later write work.
Also in the design we said microVersionId would necessarily be used for this feature only but more generally to be able to track and compare update time, I'm very worried we're gonna have issues later if we start writing microVersionID only on certain conditiion. Actually the code in arsenal, backbeat and cloudserver is already filled with quite a lot of defensive and smelly if (microVersionID) because we never bothered adding this from the start

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were wondering what happens if we don't write microVersionId on putObject : how do we compare two objects that don't have any micro version ID

no issue there:

  • there are 2 "independent" putObject on 2 different sites : they will have different versionId, so no need to compare microVersionId
  • "within" a specific version (versionID), the empty/missing versionID is inequivoquality the oldest revision of metadata (the one from the original putObject). Any followup updates to that versionID need to bump the micro version id

This case may be a "lack" of microVersionID in the metadata (field not set, for metadata size efficiency), but it is really just a sentinel value for the oldest microVersionID; and it compares exactly the same:

  • "empty" versionID on both side → already up to date (equality)
  • replicate "empty" versoinID while the target already have a versionID → stale versionID (older)
  • replicate versionID while the target has an empty versionID → proceed with replication (newer)

we start writing microVersionID only on certain conditiion.

this case will be present anyway: all objects before the upgrade will not have versionID, so we must be able to manage this state: we cannot assume it will be present.
(and crrExistingObject may add microVersionID, but I'd rather not be protected by this alone: with DR/... there may be other cases coming, best be prepared)

Comment thread lib/routes/routeBackbeat.js Outdated
Comment on lines +589 to +598
request.resume();
return _respondWithHeaders(
response,
{ code: MicroVersionIdAlreadyStoredException.name,
message: 'incoming microVersionId already at destination' },
{},
log,
callback,
409,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove duplication to simplify here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I did something, you take a look but honestly it's really not that great 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is completely duplicated, the only difference is really name of exception and "text" of message.

as discussed in cloudserverclient, do we really need cloudserver to make that distinction? just returning the same error (micro version id conflict) AND the micro version id of the object stored allows the caller to make the distinction, if they need - without duplication.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, I missed the extra type MicroVersionIdAlreadyStoredException, c.f. https://github.com/scality/cloudserverclient/pull/24/changes#r3496549728

we had the discussion and ended adding microVersionId in StaleMicroVersionIdException, but it is useless if we generate 2 kinds of errors...
→ IMHO we should drop MicroVersionIdAlreadyStoredException, which allows deduplicating here and simplifying backbeat?


otherwise we shoud still dedup, something like:

Suggested change
request.resume();
return _respondWithHeaders(
response,
{ code: MicroVersionIdAlreadyStoredException.name,
message: 'incoming microVersionId already at destination' },
{},
log,
callback,
409,
);
if (cmp !=== Ordering.NEWER) {
request.resume();
return _respondWithHeaderCrrConflict(
response, log, callback,
cmp === Ordering.EQUAL ? MicroVersionIdAlreadyStoredException.name : StaleMicroVersionIdException.name,
'incoming microVersionId already at destination',
);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending back the microVersionID is indeed not super useful, I added it because it was suggested it could become useful or nice to have the info for the caller. I'm not really gonna focus on this to much, i can remove it or keep it, imo I think it's fine that an api returns it.

But for the double error, well tbh i don't really mind either as I just wanna put an end to this feature, but for me it's better to have backbeat check the errors like this :

Image

Than having it just receive a microVersion id, then does another loop if microversion == null, if microversion < > xxx, then we don't do exactly the same thing (cbOnce(cascadeLoopDetected) vs cbOnce(cascadeDataStale)).

In the screenshot, we directly have errors that we can nicely catch and that's pretty much it.

I just see calling the client like calling any functions, and if the function I call return an error, I want the error to be as detailed as possible and not redo the checks that was already done, because anyways cloudserver has to do those comparisons here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will recheck the suggested simpification after we settle the discussion, not fully sure it should be done though because compare can return 4 values
older
younger
equal
not comparable

so doing this cmp !=== Ordering.NEWER
still leaves room for NOT_COMPARABLE, I think it's safer to assert on what we want that exclude only value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending back the microVersionID is indeed not super useful, I added it because it was suggested it could become useful or nice to have the info for the caller

not sure where it was lost in the discussion, but the suggestion was really to add microVersionID in the result so we could have a single error : i.e. reduce the logic to minimal in cloudserver, and keep the most in backbeat.

(btw, the code you mention also has similar comment about duplication: my feeling -to be confirmed- is that this duplication at every layer is due at least in part to the qualification happening too early, and would be avoided by propagating a single error [wiht microVersionId] accross all layers, and differentiating the cases only at the end of the chain)

@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch from 3c698dc to c8acd08 Compare June 25, 2026 19:23
@SylvainSenechal SylvainSenechal requested a review from maeldonn June 25, 2026 19:23
Comment thread lib/routes/routeBackbeat.js Outdated
@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch from c8acd08 to 51638dd Compare June 26, 2026 09:01

@maeldonn maeldonn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please bump package.json and create a new release. Could you just confirm that the location guard is working ?

@bert-e

bert-e commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Comment thread lib/routes/routeBackbeat.js Outdated
const incomingVersionIdEncoded = request.headers['x-scal-version-id'];
const decoded = incomingVersionIdEncoded ? decode(incomingVersionIdEncoded) : null;
const incomingVersionIdDecoded = decoded instanceof Error ? null : decoded;
if (incomingVersionIdDecoded && objMd && objMd.versionId === incomingVersionIdDecoded) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objMd.versionId === incomingVersionIdDecoded

  • Can this condition actually fail?
  • Is it really part of the condition?

AFAIU, versionId is used (or should be?) to find the existing objMD:

  • so if not found, we would get objMD=null ;
  • and if found, it should have the versionID...
  • Unless there is a some logic here which gets the matching version if it exist, else the latest one : in that case objMd.versionId !== incomingVersionIdDecoded means we actually did not an object already

→ should clarify how we get there, and nice if we can drop this condition
(remember that with CRR with cannot work in non-versioned buckets -so no "overwrites"-, and that versions may be replicated out of order)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote some part of the if statement into 2 stages, but your comment still stands.

I am not sure i understand it though, I think you are saying that if we were able to get objMd, then necessarily objMd will be equal to incomingVersionIdDecoded ?

  • objMd is obtained through a middleware, from the provided bucket + the key, and it's the latest available version of that bucket
  • the x-scal-micro-version-id is anything people using the client decide to put inside. I don't see a situation where we would skip checking objMd.version === incomingVersionId

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objMd is obtained through a middleware, from the provided bucket + the key, and it's the latest available version of that bucket

  • if by middleware you mean the "router" in the same routeBackbeat file, indeed; but still this is something under our control and closely related to this code
  • I thought this would return the specific versionId associated with the result ; if it returns the latest, then the issue is much larger : we need to try and check if we have this specific versionId. We may be replicating the "previous" version, but the logic must apply in that case as well...

the x-scal-micro-version-id is anything people using the client decide to put inside. I don't see a situation where we would skip checking objMd.version === incomingVersionId

  • this API cannot be used by "people", it is used only by scality software
  • if the x-scal-version-id is set to tell which version we are interested in (and should be retrieved). If there is no such revision, there is no conflict (and responsibility/bug of the caller if they did not provide the right versionId) ; if this revision is present, this is a conflict. We don't care if there is another revision of the object, we should only attempt to retrieve the ObjMD of the revision we are interested in.

replicationInfo: getReplicationInfo(config,
objectKey, bucketMD, false, size, null, null, authInfo),
overheadField,
updateMicroVersionId: true,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should "new" objects not keep having an empty microVersionId (to avoid wasting metadata space)?
    and just consider "no micro version id" is the oldest micro versionId (the only thing it cannot compare to is another no micro version id - but that would not happen on new object, each get their own versionId)
  • in case of un-versioned buckets (i.e. only master objects), we may need to update the micro version id (because we have no version id) ; however there is no replication in that case, so no problem not to do it

Comment thread lib/routes/routeBackbeat.js Outdated

const incomingVersionIdEncoded = request.headers['x-scal-version-id'];
const decoded = incomingVersionIdEncoded ? decode(incomingVersionIdEncoded) : null;
const incomingVersionIdDecoded = decoded instanceof Error ? null : decoded;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not actually fail if the version id cannot be decoded? (this would be a bad request)

@SylvainSenechal SylvainSenechal Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes actually i changed the single if into 2 ifs :
If the versionId is provided, but we can't decode it, it means something is wrong and we shouldn't continue anything

Comment thread lib/routes/routeBackbeat.js Outdated
function putMetadata(request, response, bucketInfo, objMd, log, callback) {
const { bucketName, objectKey } = request;

const encodedMicroVersionId = request.headers['x-scal-micro-version-id'];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to differentiate between

  • the absence of microVersionId (ie. x-scal-micro-version-id is not present in the request) → no extra check
  • microVersionId is empty (something like x-scal-micro-version-id: "", when referring to the first metadata revision when object was created) → need the extra check, and check if the object already has newer micro version id (i.e. either it does not have one → "equal", or it does → "newer")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the code a bit to do what you mention in the first point, and also return an error when the header is present but fail to be decoded like in putData.

For the other point,
The current implementation is : new source will always have a microVersionId, it's talked about in another comment at the top of this review, but I don't see how we can properly deal with empty microversion ids on new objects written after this feature is added, without having issues later. It means we would have situation like

A->B->C
I put on A, A has no microVersionID, but replicates to B and C which now have a microVersionID.
But A still doesn't have a microVersionID. What happens if there is a loop configuration and C goes back to A
What happens on the second write on A when a first write has already replicated on B and C ? B and C will have a microVersionID but A won't ?

It seems to me that deciding to not assign a microVersionID on all new putObject writes is gonna get real tricky in the long run

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new source will always have a microVersionId

I really don't understand why we need this.

This was discussed in the design, and is a critical point to mitigate the impact on MongoDB sizing.

  • It really just mean "no micro versionID" is also a valid micro version id, and older than any other valid microVersionID value
  • There is no additional risk due to concurrent putObject : as replication is possible only in versioned buckets, so object will be versioned and not have the same VersionID → so no conflict is possible (conflict is only ever checks on objects with THE SAME versionId)

I put on A, A has no microVersionID, but replicates to B and C which now have a microVersionID.

when replicating, we do putMetadata : so B and C should have the same "empty" microVersionID.
the replication itself MUST NOT change the microVersionID (or metadata, for that matter): it just performs an (almost) verbatim copy of the metadata.

Comment thread lib/routes/routeBackbeat.js Outdated
Comment on lines +589 to +598
request.resume();
return _respondWithHeaders(
response,
{ code: MicroVersionIdAlreadyStoredException.name,
message: 'incoming microVersionId already at destination' },
{},
log,
callback,
409,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is completely duplicated, the only difference is really name of exception and "text" of message.

as discussed in cloudserverclient, do we really need cloudserver to make that distinction? just returning the same error (micro version id conflict) AND the micro version id of the object stored allows the caller to make the distinction, if they need - without duplication.

Comment thread lib/routes/routeBackbeat.js Outdated
Comment on lines +589 to +598
request.resume();
return _respondWithHeaders(
response,
{ code: MicroVersionIdAlreadyStoredException.name,
message: 'incoming microVersionId already at destination' },
{},
log,
callback,
409,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, I missed the extra type MicroVersionIdAlreadyStoredException, c.f. https://github.com/scality/cloudserverclient/pull/24/changes#r3496549728

we had the discussion and ended adding microVersionId in StaleMicroVersionIdException, but it is useless if we generate 2 kinds of errors...
→ IMHO we should drop MicroVersionIdAlreadyStoredException, which allows deduplicating here and simplifying backbeat?


otherwise we shoud still dedup, something like:

Suggested change
request.resume();
return _respondWithHeaders(
response,
{ code: MicroVersionIdAlreadyStoredException.name,
message: 'incoming microVersionId already at destination' },
{},
log,
callback,
409,
);
if (cmp !=== Ordering.NEWER) {
request.resume();
return _respondWithHeaderCrrConflict(
response, log, callback,
cmp === Ordering.EQUAL ? MicroVersionIdAlreadyStoredException.name : StaleMicroVersionIdException.name,
'incoming microVersionId already at destination',
);
}

Comment thread lib/routes/routeBackbeat.js
Comment on lines +824 to +827
nextReplInfo.backends = nextReplInfo.backends.filter(b => {
const loc = config.locationConstraints[b.site];
return !loc || !BLOCKED_LOCATION_TYPES.includes(loc.type);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic should be in getReplicationInfo, this is the function responsible to build the list of backend/destinations. Doing it there will also avoid create a backend entry only to remove it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you wanna do this ? It means I have to add some kind of flag to getReplicationInfo to do the current logic in getReplicationInfo 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, maybe it can be done cleanly if, in getReplicationInfo.js I add a helper function, leave the existing function almost as it is and just call that extra helper at the end of getReplicationInfo if the flag is on ?

Comment on lines +815 to +818
// These S3-compatible Scality locations are excluded
// as cascade targets because they use the MultiBackend S3 path which
// bypasses the putData/putMetadata routes, so loop detection cannot fire
// on those destinations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this part of the design? I may be wrong, but I thought we wanted (needed maybe, to be check with product team) to trigger multi-backend replication after CRR ?
→ please double check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's here https://github.com/scality/citadel/pull/349/changes#diff-f4fbff9b43d54385a8778f028f6950744393bb5235c5860afa7eec1b89072f83R266

but yeah i need to triple check it because yes i don't really like this part with these hardcoded locations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rediscussed with Maël, I think this is ok to keep, we don't want cascade replication to continue on these locations

Comment thread lib/routes/routeBackbeat.js Outdated

@francoisferrand francoisferrand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainSenechal seems there are quite the a few open points, which don't look like they will converge quickly through review comments.

Can you prepare a small doc to list each of these topics (both on this PR and the backbeat one, likely the same but best be sure), with the different approaches and arguments in favor of each?
→ then we can go through them in a short meeting (30min should be enough?), settle the question and move forward

(note: some of these questions may be refined/addressed incrementally in followup PRs, so we could merge this PR already; and for others it may make sense to make the change first. But either way we need to settle the points ASAP -and implement the followup, if any- before we are ready to ship)

@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch 3 times, most recently from b4c6fa9 to c5ba7ae Compare July 1, 2026 19:36
@SylvainSenechal SylvainSenechal force-pushed the improvement/CLDSRV-897 branch from c5ba7ae to 91f27f0 Compare July 1, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants