Purity in the EVM
This document provides a definition of purity suitable for a signature validation contract. It provides resources for designing an on-chain purity-checking contract.
This document is not official advice. Errors may be present.
This document is available as a Git repository at github.com/sigp/opcode-purity.
Background
This document is the result of "reverse engineering" the following two contracts and the majority any credit attributed to this document is deserving of their authors:
- Serpent Purity Checker in ethereum/research by Vitalik Buterin.
- LLL Port of the above Serpent Purity Checker by @ralexstokes.
Vitalik's contract was ported to LLL by Alex with the intention that it would be used to verify the "purity" of signature validation (valsig) contracts in the (now deprecated) EIP-1011 proposal. Before we discuss the concept of purity, first we should understand the purpose of valsig contracts.
Valsig contracts were to be used to abstract the signature validation of vote and logout messages -- allowing validators to implement arbitrarily complex signature schemes instead of just relying upon transaction signatures (ECDSA).
Unfortunately, the use of arbitrary external valsig contracts opens the possibility for a damaging attack vector whereby a validator can, because they control signature validation, double-vote then prevent punishment by ensuring that the slash validation fails whilst the vote validation succeeds. Such an attack can be eliminated by reducing the space of possible valsig contracts to only those which are "pure": those which will always return the result given the same signature to validate.
Therefore, EIP-1011 required that each valsig contract must have its purity approved by an on-chain smart contract before it is permitted to be used for validation. The contract that scanned and determined valsig contact purity was called the purity checker and it is the focus of this document.
Definition of Purity
This contract provides the following definition of purity for an Ethereum smart-contract:
A contract is considered pure if it will always return the same result given sufficient gas for execution and the same transaction data and value. Specifically, it may read the data and value fields of a transaction but no other transaction information, it may not read block information and it must not read from or write to storage.
There is some room for subjectivity in the definition of purity, specifically in what can be considered "inputs" to the "function" that is the smart contract. This definition includes only transaction data and value ($T_d$, $T_v$ in the Yellow Paper) but given no concrete definition of what is transaction "context" (as opposed to transaction "input") a definition of purity can be conceived which permits the origin address (derived from $T_w$, $T_r$ and $T_s$) to be read as well. Such a definition is not compatible with signature validation contracts and is therefore excluded from this document.
Detecting Impurity On-Chain
This document assumes that detecting the purity of a contract is going to be performed on the contracts bytecode. This is more accurate than performing the action on source code because it eliminates any quirks which may be introduced during compilation. This also has the benefit of allowing on-chain verification of one contract by another as a contract may go and retrieve the bytecode of another contract and iterate through it inside the EVM.
The process of determining the purity of some bytecode will generally involve starting at the first byte (which must be an opcode), attempting to match it against the opcodes defined in the table, performing some action depending on the purity of that opcode (e.g., permit, deny or attempt address detection) and then repeating the process on the next opcode.
It is important to note that not each byte in some bytecode must be an opcode,
instead it may be a parameter supplied to a PUSH
opcode. The Serpent contract
provided in the Background section provides an example of how
one can keep track of opcodes and parameters throughout the iteration process
in order to allow for back-searching of opcodes and parameters as required for
address detection in call-type opcodes (see Address Detection
Techniques).
The rest of the document focuses on defining the purity categories for each opcode, outlining techniques that can be used to deal with call-type opcodes and then provides some detail as to why certain opcodes have been categorised as pure or potentially-impure.
Impurity Categories
There are three classifications for impure opcodes: always impure, potentially impure call-type and future impure opcodes. Each category is described below.
Always Impure
These opcodes have no use other than to mutate state, return mutable state or provide context about the execution environment. Any contract which includes an "always impure" opcode should be immediately considered impure.
Future Impure Opcodes
These opcodes are assumed to be reserved for future impure opcodes. At the time of writing, there is no formal declaration that this is the case and this judgement is solely based off the authors informal conversations with the Ethereum community.
Potentially Impure Call-Type
Call-type opcodes (see the table for a listing) may execute code at some other address. It is possible for an external call to be either pure or impure, depending on the address specified for the call. The use of a call-type opcode can only be considered pure if the address specified is:
- An address that has already been determined to be pure.
- Any of the precompile addresses within the range of
0x0000000000000000000000000000000000000001
to0x0000000000000000000000000000000000000008
. Note: the purity of these contracts is yet to be confirmed.
See the Address Detection Techniques section for some techniques for extracting the address supplied to a call-type opcode from bytecode.
Any call to an externally-owned (non-contract) address should be considered impure. This is because it can potentially have impure code deployed to it.
Impure Opcode Table
Opcode Value | Mnemonic | Impurity Category |
---|---|---|
0x31 |
BALANCE | Always Impure |
0x32 |
ORIGIN | Always Impure |
0x33 |
CALLER | Always Impure |
0x3a |
GASPRICE | Always Impure |
0x3b |
EXTCODESIZE | Always Impure |
0x3c |
EXTCODECOPY | Always Impure |
0x40 |
BLOCKHASH | Always Impure |
0x41 |
COINBASE | Always Impure |
0x42 |
TIMESTAMP | Always Impure |
0x43 |
NUMBER | Always Impure |
0x44 |
DIFFICULTY | Always Impure |
0x45 |
GASLIMIT | Always Impure |
0x46 - 0x4F |
Range of future impure opcodes | Future Impure Opcodes |
0x54 |
SLOAD | Always Impure |
0x55 |
SSTORE | Always Impure |
0xf0 |
CREATE | Always Impure |
0xff |
SELFDESTRUCT | Always Impure |
0xf1 |
CALL | Potentially Impure Call-Type |
0xf2 |
CALLCODE | Potentially Impure Call-Type |
0xf4 |
DELEGATECALL | Potentially Impure Call-Type |
0xfa |
STATICCALL | Potentially Impure Call-Type |
* 0xfb |
CREATE2 | Always Impure |
* Opcodes which were not implemented at the time of writing, but the author has an expectation they will be implemented in the future.
Address Detection Techniques
Call-type opcodes (see the table for a listing) can only be considered pure if they call a specific set of addresses (see Potentially Impure Call-Types). Therefore, in order to permit some call-type opcodes it is necessary to determine the called address from the bytecode. This section describes methods which may be used to find the address supplied to the call-type opcode with certainty.
The code which may place an address on the stack for call-type opcode can be arbitrarily complex and only discoverable by executing said code. To allow purity checking within a single Ethereum transaction the techniques here are simplistic and will provide false positives (indicating impurity). However, these techniques should never produce false negatives (indicating purity).
Techniques are provided in a Python-like pseudo-code and concrete examples can be found in the two contracts specified in the Background section.
Convenience Functions
First two convenience functions are declared; get_opcode(n)
and
get_last_opcode_param(n)
.
Convenience Function get_opcode(n)
Returns the n
'th opcode declared in the subject bytecode[]
.
If n
is out of bounds of bytecode[]
the function returns None
.
Example:
ADD = 0x01
PUSH2 = 0x61
bytecode = [PUSH2, 2, 1, ADD]
get_opcode(0)
# 3
get_opcode(2)
# None
Convenience Function get_last_opcode_param(n)
Returns the final parameter supplied to the n
'th opcode declared in
the subject bytecode[]
.
If n
is out of bounds of bytecode[]
or the n
'th opcode does not have
parameters the function returns None
.
Example:
ADD = 0x01
PUSH2 = 0x61
bytecode = [PUSH2, 2, 1, ADD]
get_last_opcode_param(0)
# 1
get_last_opcode_param(1)
# None
get_last_opcode_param(2)
# None
Address Detection Functions
Four functions are now declared which return an address if a specific pattern
of opcodes is found to precede a call-type opcode. If all of these functions
return None
, then the contract should be assumed to be impure.
Each function takes an input c
which is the index of the call-type opcode in
question. It is assumed that the on-chain purity checking contract is iterating
over the bytecode in question and each time it detects a call-type opcode it
runs these functions to attempt to detect the address being called.
Address Detection Function #1
PUSH1 = 0x60
PUSH32 = 0x7f
def address_detector_1(c):
if PUSH1 <= get_opcode(c-2) <= PUSH32:
return get_last_opcode_param(c-2)
else:
return None
Address Detection Function #2
SUB = 0x03
GAS = 0x5a
PUSH1 = 0x60
PUSH32 = 0x7f
def address_detector_2(c):
if (get_opcode(c-1) == SUB and
get_opcode(c-2) == GAS and
PUSH1 <= get_opcode(c-3) <= PUSH32):
return get_last_opcode_param(c-3)
else:
return None
Address Detection Function #3
GAS = 0x5a
SWAP1 = 0x90
def address_detector_3(c):
if (get_opcode(c-1) == GAS OR
get_opcode(c-1) == SWAP1):
return get_last_opcode_param(c-2)
else:
return None
Address Detection Function #4
DUP1 = 0x80
DUP16 = 0x8f
def address_detector_4(c):
if (DUP1 <= get_opcode(c-1) <= DUP16):
return get_last_opcode_param(c-2)
else:
return None
Opcode Listing
This section contains an opcode-by-opcode listing of each defined opcode. For each opcode the following is provided:
- Summary: a brief description of what the opcode does.
- Impurity Reasoning: a reference demonstrating impurity reasoning.
- Potential Attack: a scenario which assumes some attacker has deployed a contract and wishes to be able to have some pre-determined or ad hoc control of the return result of the contract. This section does not exhaustively list potential attacks, it simply provides an example for demonstrative purposes.
Specifications of opcodes can be found in Appendix H of the Ethereum Yellow Paper.
BALANCE
Summary: Returns the balance of some address.
References: py-evm/evm/vm/logic/context.py: balance()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by altering the balance of some external account.
ORIGIN
Summary: Returns the address of the sender of the transaction which
triggered execution. In Solidity, this is tx.origin
.
References: py-evm/evm/vm/logic/context.py: origin()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract
call by varying the private key with which a transaction is signed.
CALLER
Summary: Returns the address directly responsible for the execution. In
Solidity, this is msg.sender
.
References: py-evm/evm/vm/logic/context.py: caller()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract
call by varying the private key with which a transaction is signed or using an
intermediary contract to alter the CALLER
value.
GASPRICE
Summary: Returns the current gas price.
References: py-evm/evm/vm/logic/context.py: gasprice()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract
call by using some means to alter the gas price (e.g., directly controlling
block proposers).
EXTCODESIZE
Summary: Returns the size of the code held at some address.
References: py-evm/evm/vm/logic/context.py: extcodesize()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract.
call by deploying code to some pre-computed address.
EXTCODECOPY
Summary: Copies some amount of code at some address to some position in
memory.
References: py-evm/evm/vm/logic/context.py: extcodecopy()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by deploying code to some pre-computed address.
BLOCKHASH
Summary: Returns the hash of some past block (within the previous 256
complete blocks).
References: py-evm/evm/vm/logic/block.py: blockhash()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by controlling some portion of block proposers and selecting block hashes based
upon how they will influence the contract call.
COINBASE
Summary: Returns the beneficiary address of the block.
References: py-evm/evm/vm/logic/block.py: coinbase()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by controlling some portion of block proposers and declaring the beneficiary
address based upon how it will influence the contract call.
TIMESTAMP
Summary: Returns the timestamp of the block.
References: py-evm/evm/vm/logic/block.py: timestamp()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by controlling some portion of block proposers and declaring the timestamp
based upon how it will influence the contract call.
NUMBER
Summary: Returns the number of the block (count of blocks in the chain
since genesis).
References: py-evm/evm/vm/logic/block.py: number()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by selecting in which block a transaction should be included.
DIFFICULTY
Summary: Returns the block difficulty.
References: py-evm/evm/vm/logic/block.py: difficulty()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by assuming some control of the collective hash rate and modifying it
based upon how it will influence the contract call.
GASLIMIT
Summary: Returns the block gas limit.
References py-evm/evm/vm/logic/block.py: gaslimit()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract
call by using some means to alter the gas limit (e.g., directly controlling
block proposers or spamming the network).
SLOAD
Summary: Returns a word from storage.
References py-evm/evm/vm/logic/storage.py: sload()
Impurity Reasoning: reads state.
Potential Attack: At the time of writing the author is not aware of any
attack using SLOAD if all other purity directives are followed. However,
attacks could be imagined if combined with the SLOAD
opcodes (other
attacks may be possible).
SSTORE
Summary: Saves some word to storage.
References: py-evm/evm/vm/logic/storage.py: sstore()
Impurity Reasoning: reads and mutates state.
Potential Attack: At the time of writing the author is not aware of any
attack using SSTORE if all other purity directives are followed. However,
attacks could be imagined if combined with the SSTORE
or GAS
opcodes (other
attacks may be possible).
CREATE
Summary: Creates a new account given some code.
References: py-evm/evm/vm/logic/system.py: Create.__call__()
Impurity Reasoning: reads and mutates state.
Potential Attack: At the time of writing the author is not aware of any
attack using CREATE if all other purity directives are followed. However,
attacks could be imagined if combined with the EXTCODESIZE
opcode (other
attacks may be possible).
SELFDESTRUCT
Summary: Registers the account for deletion, sending remaining Ether to
some address.
References: py-evm/evm/vm/logic/system.py: _selfdestruct()
Impurity Reasoning: reads and mutates state.
Potential Attack: An attacker may self-destruct a contract, causing all
future calls to it to fail.
CALL
Summary: Message-calls to some address.
References: py-evm/evm/vm/logic/call.py: Call()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may call an impure contract and use its
return data.
CALLCODE
Summary: Execute the code of some other account using the state of this
account.
References: py-evm/evm/vm/logic/call.py: CallCode()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may callcode an impure contract and read or
mutate state.
DELEGATECALL
Summary: Execute the code of some other account using the state of this
account whilst retaining the same values for sender
and value
.
References: py-evm/evm/vm/logic/call.py: DelegateCall()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may delegate an impure contract and read or
mutate state.
STATICCALL
Summary: Message-calls to some address without persisting state
modifications.
References: py-evm/evm/vm/logic/call.py: StaticCall()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may call an impure contract and use its
return data.
CREATE2
This opcode has not been implemented at the time of writing.
Summary: Creates a new account given some code and some nonce (as opposed
to CREATE
which uses the current account nonce).
References: EIP86.
Impurity Reasoning: Reads and mutates state.
Potential Attack: An attacker could craft a contract which succeeds the
first time it is called, but fails all other times.