Add support for custom election tick by LakshimiRam-073 · Pull Request #9725 · dgraph-io/dgraph

LakshimiRam-073 · 2026-05-29T07:14:00Z

Description

This PR adds election-tick as a configurable sub-option to the existing --raft superflag for both Alpha and Zero nodes.

Currently, ElectionTick is hardcoded to 20 in conn/node.go:91 (equivalent to 2 seconds at the 100ms tick interval). This makes it impossible for operators to control leader election priority across nodes deployed in heterogeneous network topologies -- for example, nodes spread across different network segments with varying inter-node RTTs (1-2ms within the same segment vs. 10-50ms across segments), or nodes behind different network switches with asymmetric bandwidth, latency, and jitter characteristics.

Changes:

worker/server_state.go — Added election-tick=20 to RaftDefaults string (preserves backward compatibility)
conn/node.go — Read election-tick from x.WorkerConfig.Raft config instead of hardcoded value
Validation — Added minimum bound check ensuring election-tick >= 10 * HeartbeatTick per etcd/raft library requirements

Use case: In deployments where Alpha nodes span different network segments with asymmetric latency, operators can now bias leadership toward low-latency nodes:

# Low-latency segment — fast election timeout (1-2s)
dgraph alpha --raft="idx=1; group=1; election-tick=10"

# High-latency segment — conservative timeout (5-10s)
dgraph alpha --raft="idx=4; group=1; election-tick=50"

This ensures the Raft leader stays in the segment closest to application servers, avoiding unnecessary write latency degradation from suboptimal leader placement after elections.

Default behavior is unchanged — without specifying election-tick, nodes use the previous value of 20 (2 second timeout).

Checklist

The PR title follows the
Conventional Commits syntax, leading
with fix:, feat:, chore:, ci:, etc.
Code compiles correctly and linting (via trunk) passes locally
Tests added for new functionality, or regression tests for bug fixes added as applicable
For public APIs, new features, etc., a PR on the
docs repo staged and linked here.

LakshimiRam-073 · 2026-06-02T07:07:19Z

Hi, just checking in on this PR since it has been waiting for review for about 4 days. Could you please take a look when you get a chance? I would appreciate any feedback or approval so I can move it forward. Thank you.

@matthewmcneely

shiva-istari · 2026-06-09T03:35:11Z

+func NewNode(rc *pb.RaftContext, store *raftwal.DiskStorage, tlsConfig *tls.Config,
+	electionTick int) *Node {
+
+	if electionTick <= 0 {


The current guard only handles electionTick <= 0. But HeartbeatTick is hardcoded to 1 just below, and etcd raft requires ElectionTick > HeartbeatTick. If an operator sets election-tick=1, Config.validate()
returns "election tick must be greater than heartbeat tick" and newRaft panics during StartNode a cryptic crash on boot that never mentions the flag they set.

Since this is the only place a raft.Config is built (every Alpha node and Zero flow through NewNode), the heartbeat floor of 1 applies everywhere, so election-tick can never legally be 1. Better to fail fast with
a clear message than let raft panic

Good catch, this has been addressed in the latest commit.

Added validation at lines 91–94:

const heartbeatTick = 1 // 100ms per tick if electionTick <= 0 { electionTick = 20 } if electionTick <= heartbeatTick { glog.Fatalf( "election-tick=%d is invalid: must be greater than heartbeat-tick (%d). "+ "Recommended minimum is 10 (1s election timeout).", electionTick, heartbeatTick, ) }

Now, if a user sets:

--raft "election-tick=1"

the process fails fast during startup with a clear validation error instead of encountering a cryptic Raft panic later during initialization.

Thanks for reviewing, let me know if any changes/explaination req.

On second thought, do you think it would make sense to add a stricter fail-safe and reject election tick values below 10 altogether? Since the recommendation is already 10 (1s election timeout), allowing smaller values may not be particularly useful and could lead to unstable configurations.

Similarly, would it be worth introducing an upper bound as well (for example, around 24 hours) to prevent accidentally misconfigured values resulting in extremely long election timeouts?

I'd be interested in your thoughts on both of these. Do you prefer keeping the validation minimal (only ensuring it's greater than the heartbeat tick), or would you favor enforcing a practical operating range for election tick values?

Hi @LakshimiRam-073 , Thanks for adding the validation. The fail-fast check on electionTick <= heartbeatTick makes sense to me since that's a hard correctness requirement anyway (raft.Config.validate() will panic if it's violated).

For the two follow-up questions, I'd personally keep the validation fairly minimal and avoid enforcing a practical range.

For the lower bound of 10, I don't think we should reject smaller values. One of the main reasons for exposing this flag is to give operators control over election timing, and there are valid use cases for running below 10. For example, integration/CI tests often benefit from much faster failover, single-node development setups don't have the same stability concerns, and some low-latency deployments may intentionally optimize for sub-second failover. etcd itself only requires electionTick > heartbeatTick. Also, since we run with PreVote: true, we're already protected from one of the more problematic failure modes where a flaky follower can repeatedly disrupt a healthy leader.

I'd also avoid adding an upper bound. Any limit we pick—24 hours or otherwise is ultimately arbitrary. If someone configures an extremely large election timeout, they've effectively decided they want leader changes to be very rare or even handled manually. That's unusual, but it could still be intentional. Hard limits in cases like this often end up causing more frustration than value.

That said, I don't love silently accepting very small values either. Since raft randomizes the timeout in [electionTick, 2*electionTick), setting electionTick=2 means an election can start after missing only a couple of heartbeats. At that point, something as simple as a GC pause or brief network hiccup could trigger unnecessary leader elections.

Instead of rejecting those values, I'd lean toward logging a warning:

if electionTick < 10 { glog.Warningf("election-tick=%d gives a %dms election timeout. Values below 10 (1s) "+ "may cause spurious leader elections under GC pauses or network jitter.", electionTick, electionTick*100) }

That way we preserve flexibility while still making operators aware of the tradeoffs.

Updated in commit fix(raft): warn on negative and low election-tick values (view changes).

I kept the hard correctness check on electionTick <= heartbeatTick, and preserved flexibility for smaller valid values by logging a warning instead of rejecting them. Negative values now warn and fall back to the default, while 0 continues to mean unset/default.

panic with a cryptic. ( election tick > heartbeat issue)

matthewmcneely · 2026-06-11T17:49:21Z


 on:
-  pull_request:
+  pull_request_target:


A change to this file is scope-creep. Please revert, open a different PR with rationale for this change if you want.

Reverted. Will open a separate PR with rationale if we want to pursue that change.

- Split negative and zero handling: negative values log a warning and default to 20; zero silently defaults to 20 (unset/zero-value case) - Return warning string from normalizeElectionTick instead of two bools - Warn (not reject) when election-tick < 10 per reviewer guidance

Reverting scope-creep changes to .github/workflows/labeler.yml. Will open a separate PR with rationale if needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

matthewmcneely · 2026-06-17T16:35:37Z

+	var warning string
+	electionTick, warning = normalizeElectionTick(electionTick)
+	if warning != "" {
+		glog.Warningf(warning)


The formatted version is not needed (and should be failing govet pass)... please just use glog.Warning

matthewmcneely · 2026-06-17T16:40:12Z

+	recommendedElectionTick = 10
+)
+
+func normalizeElectionTick(electionTick int) (tick int, warning string) {


The election-tick=1 handling in normalizeElectionTick is inconsistent with how the function treats every other invalid value, and I'd pull it into line.

Right now a negative tick coerces to the default 20 and logs a warning, but electionTick == 1 calls glog.Fatalf and takes the process down. Those are both equally-invalid inputs, so it's odd that -100 gets a friendly default while a single fat-fingered 1 is fatal. An operator who typos election-tick=1 shouldn't lose the node over it.

The Fatalf also blocks the test. The table covers 0, -1, -100, 2, 10, 20, 100 but skips 1, and it has to: exercising that path would os.Exit the test binary. So the one genuinely-fatal branch is the one branch with no coverage.

The intent behind it is right. Failing with a clear message beats letting etcd/raft panic with something cryptic. But coercing 1 to the default and warning, exactly like the negative case, gets you that clear message while keeping the behavior consistent, fully unit-testable, and crash-free on a typo.

Suggest dropping the glog.Fatalf branch and folding 1 into the same coerce-and-warn path as the negatives.

Add support for custom election tick

377228d

LakshimiRam-073 requested a review from a team as a code owner May 29, 2026 07:14

matthewmcneely self-requested a review May 29, 2026 16:04

LakshimiRam-073 and others added 2 commits June 2, 2026 21:07

Merge branch 'main' into feature/custom-election-tick

40ac0d3

fix: add issues:write permission to labeler workflow

ee4d6ff

shiva-istari reviewed Jun 9, 2026

View reviewed changes

shiva-istari requested changes Jun 9, 2026

View reviewed changes

Fail fast with a clear error message instead of letting etcd raft

f26e195

panic with a cryptic. ( election tick > heartbeat issue)

LakshimiRam-073 requested a review from shiva-istari June 9, 2026 15:25

matthewmcneely requested changes Jun 11, 2026

View reviewed changes

LakshimiRam-073 requested a review from matthewmcneely June 15, 2026 07:38

revert: labeler workflow changes

7071d3b

Reverting scope-creep changes to .github/workflows/labeler.yml. Will open a separate PR with rationale if needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

LakshimiRam-073 force-pushed the feature/custom-election-tick branch from b71204e to 7071d3b Compare June 16, 2026 10:28

matthewmcneely requested changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for custom election tick#9725

Add support for custom election tick#9725
LakshimiRam-073 wants to merge 6 commits into
dgraph-io:mainfrom
LakshimiRam-073:feature/custom-election-tick

LakshimiRam-073 commented May 29, 2026

Uh oh!

LakshimiRam-073 commented Jun 2, 2026 •

edited

Loading

Uh oh!

shiva-istari Jun 9, 2026

Uh oh!

LakshimiRam-073 Jun 9, 2026

Uh oh!

LakshimiRam-073 Jun 9, 2026

Uh oh!

shiva-istari Jun 11, 2026

Uh oh!

LakshimiRam-073 Jun 15, 2026 •

edited

Loading

Uh oh!

matthewmcneely Jun 11, 2026

Uh oh!

LakshimiRam-073 Jun 15, 2026

Uh oh!

matthewmcneely Jun 17, 2026

Uh oh!

matthewmcneely Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

LakshimiRam-073 commented May 29, 2026

Description

Checklist

Uh oh!

LakshimiRam-073 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshimiRam-073 Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

LakshimiRam-073 commented Jun 2, 2026 •

edited

Loading

LakshimiRam-073 Jun 15, 2026 •

edited

Loading