The beacon chain is designed to automatically recover finality if more than 1/3 of validators become permanently inactive (commonly referred to as surviving World War 3). Inactive validators are leaked to reduce their share of active stake and restore finality. To finalize again, it needs to satisfy this inequality:
participating_stake / active_stake > 66%
Assuming participating_stake
is constant (no new deposits) it needs to reduce active_stake
by:
- Forcefully ejecting non-participating stake (ejection condition)
- Reducing the balance of active non-participating validators (inactivity leak)
In order for 1) to be formalized, we must set a constant, EJECTION_BALANCE
, where a validator is ejected if its balance falls below this value.
The issue with 1) is that ejections must go through the exit queue. However, if more than 1/3 of validators need
to be ejected, the exit queue will take months to clear. So in practice 2) dominates to restore
finality. Because of this, most validators will leak past the ejection balance, based on what share of
stake is permanently offline.
Inactivity leak epoch-by-epoch timeline
The ejection condition dictated by EJECTION_BALANCE
can be misunderstood as a lower bound for validator balances.
However, during an inactivity leak, most validators will leak well past EJECTION_BALANCE
. Let's run through an example
step by step to illustrate it.
Note: the charts below are created with a Jupyter Notebook implementing the Bellatrix spec with faster config variables. For detailed numbers refer to the next section which runs a full inactivity leak simulation.
The following plots track three key variables through an inactivity leak where 50% of the validators suddenly become offline.
- 🟥 Balance: Solid Line: Average of offline validators. Area: Range of all values
- 🟦 Inactivity scores: Solid Line: Average of offline validators, Area: Range of all values
- 🟩 Fraction of active validators
If finality delays for more than 4 epochs, the inactivity score of offline validators starts to increase linearly with time. Inactivity balance penalties are proportional to the inactivity score, so balance decreases ~ quadratically with time.
Eventually, the balance of inactive validators reaches EJECTION_BALANCE
and those validators are
ejected at once. Note that this is a perfect example where all validators are starting with the same balance.
In a real scenario, there will be some offset between ejections. Also, validators will likely attempt
to voluntary exit as soon as possible, making exits more staggered. However, since this action does not
meaningfully alter the results, it is omitted for now.
Despite all validators being ejected at once, the exit queue forces a constant rate of exits per
unit of time. The validators at the start of the queue exit first, with a balance close to EJECTION_BALANCE
.
However, the rest continue to be active and offline increasing their inactivity penalties and leaking
balance.
Eventually, thanks to the leak, the share of participating stake reaches 2/3 and the network finalizes. Inactivity scores of still active validators reduce at a higher rate but take some time to reach 0. During this time, not yet exited validators keep leaking.
The result is an incentive to race to the exit, as the validators that can front-run the queue will
suffer the least penalties. However, the majority of offline validators will end up with a balance
far below EJECTION_BALANCE
.
The continued leak post-finality is per design. If the network stops leaking exactly when reaching 2/3 participation, it will be at risk of going back into non-finality again. This leaking "inertia" allows us to overshoot into higher participation regimes.
What's the effect of EJECTION_BALANCE
?
EJECTION_BALANCE
moderates how early validators start to be placed in the ejection queue during
an inactivity leak. Does faster or slower ejection meaningfully affect finality recovery and bounds
validator losses?
The previous section shows an exemplary case of an inactivity leak. However, to get exact numbers for a network of 1 million indexes, the Python spec is too slow. Each run takes 10-30 minutes on moderate hardware 🫠. To iterate faster I translated the simplified Bellatrix spec to a faster language and computed key metrics with different settings (source code).
To estimate the effects of the EJECTION_BALANCE
constant, a new simulation was run to calculate two main variables as a function of inactive validator percent:
- Inactivity Leak Stop - This is the time (in days) before the inactivity leak stops
- Total Balance Burned - The total network-wide balance burned at the end of the simulation, defined when the max inactivity score of all active validators is 0.
This simulation was using Bellatrix constants, 1e6 initial active indexes and, an equal initial balance of all validators of 32 ETH. The results are shown in the following graphs
Source code and tabulated results
Except for impractically high values, EJECTION_BALANCE
does not significantly influence our two main variables.
In the simulated scenario with most validators' initial balances at 32 ETH (mainnet today), the effect
of the current ejection condition (EJECTION_BALANCE
= 16 ETH) against no ejection condition at all (EJECTION_BALANCE
= 0 ETH) is minimal.
Ejection condition under MaxEB (EIP-7251)
EIP-7251: Increases the MAX_EFFECTIVE_BALANCE which extends the range
of possible active balances a validator can have. From genesis, the beacon chain has targeted all
validators to have a range of active balance between 16 (EJECTION_BALANCE
) and 32 ETH (MAX_EFFECTIVE_BALANCE
).
As we have seen before, EJECTION_BALANCE
does not contribute meaningfully to finality recovery.
So, why was it added in the first place? The beacon chain relies on validators having a sufficiently
uniform balance to ensure that committees are majority honest. Since committee selection is not
balance-based, the lower bound of 16 ETH ensures that a random selection of indexes is majority
honest with very high probability.
With EIP-7251, the active balance range is extended to between 16 (EJECTION_BALANCE
) and 2048 ETH (MAX_EFFECTIVE_BALANCE_EIP7251
)
Let's explore the implications of this extended range:
Does it affect finality recovery?
No, as we have seen above reducing the ejection balance does not meaningfully affect the time to recovery. Increasing the active balance upper bound is conceptually equivalent to reducing the lower bound, i.e. reducing ejection balance:
Does it increase the balance leaked during non-finality?
During non-finality, the average balance leaked is not a function of ejection balance, so the % of leaked balance will remain roughly the same as today. Refer to the previous section for exact numbers.
Does it increase the time to eject offline validators during finality?
Yes, with EIP-7251 an offline validator during timely finality can theoretically lose more balance.
However, the inactivity penalties are so small that it will take a validator decades to reach EJECTION_BALANCE
,
even starting from 32ETH.
Waiting 30 or 50 years to be ejected is a sub-optimal option. Instead, a perpetually
offline validator (due to key loss for example) should use EIP-7002 (Execution layer triggerable
exits) to exit the validator and suffer way fewer losses than if waiting for any safe EJECTION_BALANCE
value. If chronically offline validators become a network-wide issue, there is plenty of
time (years) to design and ship a solution.
Summary
Should the ejection mechanism be modified to accommodate EIP-7251?
We have established that:
EJECTION_BALANCE
does not meaningfully contribute to finality recoveryEJECTION_BALANCE
is not useful to clean up perpetually offline validators- with MaxEB, the network can handle big ranges of active balances
Therefore, we should do nothing. Leave the parameter at EJECTION_BALANCE = 16
, ship EIP-7002,
and promote client diversity so stakers don't have to worry about the inactivity leak.