Add ConfigNode ReadOnly heartbeat self-check (DiskFull/DiskCrash)#17724
Open
CRZbulabula wants to merge 5 commits into
Open
Add ConfigNode ReadOnly heartbeat self-check (DiskFull/DiskCrash)#17724CRZbulabula wants to merge 5 commits into
CRZbulabula wants to merge 5 commits into
Conversation
…check ConfigNode now reports its own NodeStatus.ReadOnly when its critical directories (systemDir, consensusDir) are unwritable or near-full, mirroring the existing DataNode behavior. NodeStatus reasons are extended with a new DISK_CRASH constant alongside DISK_FULL, and the ConfigNode heartbeat carries status/statusReason back to the leader. - node-commons: new DiskChecker utility (probe + state-machine apply), with priority DiskCrash > DiskFull and recovery to Running when the reason was disk-related. i18n messages added in en + zh. - thrift-confignode: TConfigNodeHeartbeatResp gains optional status and statusReason fields (forward-compatible). - confignode: leader self-checks before fanning out heartbeats; follower self-checks on receive and reports back; cache reads from CommonConfig for the leader's self entry, otherwise from the sample. - datanode: FolderManager exposes a static hasAnyAbnormalFolder() aggregator; sampleDiskLoad treats any ABNORMAL folder as DiskCrash (which wins over DiskFull) and reuses DiskChecker.apply.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17724 +/- ##
============================================
+ Coverage 40.41% 40.55% +0.13%
- Complexity 2574 2577 +3
============================================
Files 5179 5181 +2
Lines 349767 350066 +299
Branches 44714 44768 +54
============================================
+ Hits 141373 141978 +605
+ Misses 208394 208088 -306 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…n after fanout - HeartbeatService: the leader's DiskChecker.checkAndApply now runs after pingRegisteredConfigNodes/DataNodes/AINodes so its (blocking) probe IO does not delay the async heartbeat dispatch. Gated on a snapshot of heartbeatCounter taken at loop entry so the leader self-samples on the same iterations DataNode/AINode load sampling fires. The 10-iteration cadence is lifted into a shared LOAD_SAMPLING_INTERVAL constant and reused by the existing setNeedSamplingLoad callsites. - ConfigNodeRPCServiceProcessor: follower side gates its disk check on a local heartbeatReceivedCounter (every Nth receive, matching the leader cadence) instead of running on every received heartbeat. - Tidy comments per code-review feedback.
…sive crash detect Three changes that let ReadOnly state actually shape Raft behavior on ConfigNode: - Utils.rejectWrite / stallApply now match ConfigRegion in addition to DataRegion, so a ReadOnly ConfigNode leader hits the same forceStepDownLeader path that DataRegion leaders already use. Comment at RatisConsensus.write updated. - New NodeStatus.priorityForStatus maps Running=0, ReadOnly(DiskFull)=-1, ReadOnly(DiskCrash)=-2. HeartbeatService runs a reconciliation step on the leader (same cadence as the load-sampling pass, after async fanout) that pushes each ConfigNode peer's desired priority into Ratis. Unknown/Removing/manual ReadOnly are left empty so transient blips do not churn the group config. IConsensus gains a default-no-op reconfigurePeerPriorities; RatisConsensus overrides it to rebuild the peer list and call sendReconfiguration. - Replace DiskChecker.check (active testWrite probe) with a passive observer threaded through Ratis. ApplicationStateMachineProxy gains a diskFailureListener parameter and fires it from the applyTransaction catch when Utils.isDiskFailure matches the cause (IOError / FileSystemException). RatisConsensus also tags IOException out of writeLocallyWithRetry / writeRemotelyWithRetry so log-write failures register as DiskCrash. DiskChecker keeps only checkFreeRatio (for the DiskFull path) and apply (for the state machine); DiskCrash is now sticky on both DataNode and ConfigNode until restart. DiskCheckerTest trimmed to drop testWrite-specific cases and to assert that NORMAL no longer recovers DiskCrash; 14 cases pass.
…rtbeatService - TConfigNodeHeartbeatReq carries needSamplingLoad; ConfigNodeRPCServiceProcessor gates its disk-health check on that flag instead of a local counter, matching the DataNode/AINode heartbeat shape. - HeartbeatService now bumps heartbeatCounter exactly once at the top of heartbeatLoopBody and threads the iterationIndex into genHeartbeatReq / genConfigNodeHeartbeatReq / genAIHeartbeatReq / addConfigNodeLocationsToReq; the gen methods no longer touch the counter. - Cross-peer priority reconciliation moved out of HeartbeatService into a new ConfigRegionPriorityBalancer that implements IClusterStatusSubscriber. It is registered on the EventService alongside RouteBalancer/TopologyService and only reacts when NodeStatisticsChangeEvent reports a transition whose priority bucket actually moved (filtered to ConfigNode peers, gated on isLeader). - RatisConsensus.reconfigurePeerPriorities is now a mechanical merge — it rebuilds the peer list from the requested priorities and unconditionally calls sendReconfiguration. Decisions about "did the priority change" live in the balancer, not at the consensus layer.
…ndex threading - HeartbeatService bumps heartbeatCounter once at the tail of heartbeatLoopBody; the gen* methods read heartbeatCounter.get() directly again instead of taking an iterationIndex parameter. - Remove the standalone ConfigRegionPriorityBalancer. Priority reconciliation now lives in EventService, fired from checkAndBroadcastNodeStatisticsChangeEventIfNecessary exactly when ConfigNode statistics change. It is leader-gated, filtered to ConfigNode peers, and only pushes peers whose priority bucket actually moved. EventService now takes the IManager to reach the consensus impl.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
Why
ConfigNode currently has no notion of being
ReadOnly— only DataNode samples its disk and self-marks. The heartbeat from the leader to other ConfigNodes is effectively just a liveness ping, so a ConfigNode with a full or crashed disk would silently keep accepting writes from peers / clients. This PR extendsReadOnlytracking to ConfigNode and adds a new disk-failure reason (DiskCrash) on top of the existingDiskFull.What changes
NodeStatus.DISK_CRASHconstant added next toDISK_FULL(both are still string identifiers stored instatusReason).org.apache.iotdb.commons.cluster.DiskCheckerinnode-commons: probes a list of directories via test-write + free-space ratio, then runs a single state-machine apply onCommonConfigwith priority DiskCrash > DiskFull > Normal. Recovery toRunningonly fires when the prior reason was disk-related; otherReadOnlyreasons (e.g. manual maintenance) are left untouched.[systemDir, consensusDir]at the top ofHeartbeatService#heartbeatLoopBody(before fanning out heartbeats).ConfigNodeRPCServiceProcessor#getConfigNodeHeartBeat, and reportsstatus+statusReasonback via newly-added optional fields 4 and 5 onTConfigNodeHeartbeatResp(forward-compatible — old peers simply leave them unset).ConfigNodeHeartbeatCache#updateCurrentStatisticsno longer short-circuits forCURRENT_NODE_ID; instead, the self entry mirrorsCommonConfig, soshow confignodesreflects the leader's own disk state.FolderManagernow registers each instance into a weak-ref static list and exposesstatic boolean hasAnyAbnormalFolder().DataNodeInternalRPCServiceImpl#sampleDiskLoadconsults that aggregator and, when any folder isABNORMAL, maps toDiskCrash(which outranks the existing free-ratioDiskFullcheck). State-machine application is delegated toDiskChecker.applyso DataNode and ConfigNode follow identical transition rules.Design notes
SystemMetric.SYS_DISK_AVAILABLE_SPACE) forDiskFull. The newDiskCrashsignal is path-scoped — it just observes already-recorded write failures rather than probing IO itself. ConfigNode runs both checks per-directory throughDiskChecker(File.getUsableSpace/getTotalSpace+ a tinyFiles.createTempFile/write/deleteprobe), giving symmetric behavior on the two nodes from the cluster's perspective.optionalto keep rolling upgrade safe: an older ConfigNode that doesn't populate them parses asRunningwith no reason.ReadOnly(DiskCrash)on DataNode (would require newABNORMAL -> HEALTHYtransitions insideFolderManagerand is left as follow-up). ConfigNode does auto-recover, becausetestWritereruns every heartbeat.i18n
New disk health messages live in
CommonMessagesunder bothsrc/main/i18n/enandsrc/main/i18n/zh:DISK_FULL_SET_READ_ONLYDISK_CRASH_SET_READ_ONLYDISK_CRASH_PROBE_FAILEDDISK_RECOVERED_SET_RUNNINGThe existing inline English log in
sampleDiskLoadis retained (just descriptive context); the state-change log itself is routed throughCommonMessagesso the Chinese build works out of the box.This PR has:
Known follow-ups
Key changed/added classes in this PR
New
Modified