cassandra/client_routes.py appears to mishandle partial CLIENT_ROUTES_CHANGE updates.
Problem
The route store keeps only one selected route per host_id, and _select_preferred_routes() tries to preserve stickiness by preferring the currently selected connection_id when that same connection_id is present in the newly fetched candidates.
However, for partial CLIENT_ROUTES_CHANGE handling, the fetched candidates may contain only a subset of connection IDs for an affected host.
Failure mode
Consider this sequence:
- Host
X is currently using sticky route (connection_id=A, host_id=X).
- A
CLIENT_ROUTES_CHANGE event arrives for host X, but only for connection IDs B / C.
_query_routes_for_change_event() fetches only routes matching the event-derived filters.
_select_preferred_routes() cannot keep A, because A is absent from the fetched candidates.
merge() drops the old route for affected host X and replaces it with one of the newly fetched routes.
This means the driver can switch away from A even though it has not learned that (A, X) was removed. It only learned that some other routes for host X changed.
Relevant code
- Explicit stickiness selection:
_select_preferred_routes()
- Partial event merge:
merge()
- Event handling path:
handle_client_routes_change()
- Event query:
_query_routes_for_change_event()
Expected behavior
The driver should not switch away from the currently used connection_id for a host unless it has enough information to conclude that the sticky route is no longer valid.
In other words, absence of (A, X) from a partial event query result should not be treated as proof that (A, X) was deleted.
Possible direction
One likely fix is to store all known routes for each host, not only the currently selected one, and track stickiness separately. That would allow the driver to:
- preserve stickiness when unrelated connection IDs for the same host change,
- delete routes only when the event/query gives enough information to do so safely,
- choose a replacement deterministically if the sticky route is actually removed.
cassandra/client_routes.pyappears to mishandle partialCLIENT_ROUTES_CHANGEupdates.Problem
The route store keeps only one selected route per
host_id, and_select_preferred_routes()tries to preserve stickiness by preferring the currently selectedconnection_idwhen that sameconnection_idis present in the newly fetched candidates.However, for partial
CLIENT_ROUTES_CHANGEhandling, the fetched candidates may contain only a subset of connection IDs for an affected host.Failure mode
Consider this sequence:
Xis currently using sticky route(connection_id=A, host_id=X).CLIENT_ROUTES_CHANGEevent arrives for hostX, but only for connection IDsB/C._query_routes_for_change_event()fetches only routes matching the event-derived filters._select_preferred_routes()cannot keepA, becauseAis absent from the fetched candidates.merge()drops the old route for affected hostXand replaces it with one of the newly fetched routes.This means the driver can switch away from
Aeven though it has not learned that(A, X)was removed. It only learned that some other routes for hostXchanged.Relevant code
_select_preferred_routes()merge()handle_client_routes_change()_query_routes_for_change_event()Expected behavior
The driver should not switch away from the currently used
connection_idfor a host unless it has enough information to conclude that the sticky route is no longer valid.In other words, absence of
(A, X)from a partial event query result should not be treated as proof that(A, X)was deleted.Possible direction
One likely fix is to store all known routes for each host, not only the currently selected one, and track stickiness separately. That would allow the driver to: