Purity in the EVM

Category Analysis

Purity in the EVM

This document provides a definition of purity suitable for a signature validation contract. It provides resources for designing an on-chain purity-checking contract.

This document is not official advice. Errors may be present.

This document is available as a Git repository at github.com/sigp/opcode-purity.

Background

This document is the result of "reverse engineering" the following two contracts and the majority any credit attributed to this document is deserving of their authors:

Vitalik's contract was ported to LLL by Alex with the intention that it would be used to verify the "purity" of signature validation (valsig) contracts in the (now deprecated) EIP-1011 proposal. Before we discuss the concept of purity, first we should understand the purpose of valsig contracts.

Valsig contracts were to be used to abstract the signature validation of vote and logout messages -- allowing validators to implement arbitrarily complex signature schemes instead of just relying upon transaction signatures (ECDSA).

Unfortunately, the use of arbitrary external valsig contracts opens the possibility for a damaging attack vector whereby a validator can, because they control signature validation, double-vote then prevent punishment by ensuring that the slash validation fails whilst the vote validation succeeds. Such an attack can be eliminated by reducing the space of possible valsig contracts to only those which are "pure": those which will always return the result given the same signature to validate.

Therefore, EIP-1011 required that each valsig contract must have its purity approved by an on-chain smart contract before it is permitted to be used for validation. The contract that scanned and determined valsig contact purity was called the purity checker and it is the focus of this document.

Definition of Purity

This contract provides the following definition of purity for an Ethereum smart-contract:

A contract is considered pure if it will always return the same result given sufficient gas for execution and the same transaction data and value. Specifically, it may read the data and value fields of a transaction but no other transaction information, it may not read block information and it must not read from or write to storage.

There is some room for subjectivity in the definition of purity, specifically in what can be considered "inputs" to the "function" that is the smart contract. This definition includes only transaction data and value ($T_d$, $T_v$ in the Yellow Paper) but given no concrete definition of what is transaction "context" (as opposed to transaction "input") a definition of purity can be conceived which permits the origin address (derived from $T_w$, $T_r$ and $T_s$) to be read as well. Such a definition is not compatible with signature validation contracts and is therefore excluded from this document.

Detecting Impurity On-Chain

This document assumes that detecting the purity of a contract is going to be performed on the contracts bytecode. This is more accurate than performing the action on source code because it eliminates any quirks which may be introduced during compilation. This also has the benefit of allowing on-chain verification of one contract by another as a contract may go and retrieve the bytecode of another contract and iterate through it inside the EVM.

The process of determining the purity of some bytecode will generally involve starting at the first byte (which must be an opcode), attempting to match it against the opcodes defined in the table, performing some action depending on the purity of that opcode (e.g., permit, deny or attempt address detection) and then repeating the process on the next opcode.

It is important to note that not each byte in some bytecode must be an opcode, instead it may be a parameter supplied to a PUSH opcode. The Serpent contract provided in the Background section provides an example of how one can keep track of opcodes and parameters throughout the iteration process in order to allow for back-searching of opcodes and parameters as required for address detection in call-type opcodes (see Address Detection Techniques).

The rest of the document focuses on defining the purity categories for each opcode, outlining techniques that can be used to deal with call-type opcodes and then provides some detail as to why certain opcodes have been categorised as pure or potentially-impure.

Impurity Categories

There are three classifications for impure opcodes: always impure, potentially impure call-type and future impure opcodes. Each category is described below.

Always Impure

These opcodes have no use other than to mutate state, return mutable state or provide context about the execution environment. Any contract which includes an "always impure" opcode should be immediately considered impure.

Future Impure Opcodes

These opcodes are assumed to be reserved for future impure opcodes. At the time of writing, there is no formal declaration that this is the case and this judgement is solely based off the authors informal conversations with the Ethereum community.

Potentially Impure Call-Type

Call-type opcodes (see the table for a listing) may execute code at some other address. It is possible for an external call to be either pure or impure, depending on the address specified for the call. The use of a call-type opcode can only be considered pure if the address specified is:

  • An address that has already been determined to be pure.
  • Any of the precompile addresses within the range of 0x0000000000000000000000000000000000000001 to 0x0000000000000000000000000000000000000008. Note: the purity of these contracts is yet to be confirmed.

See the Address Detection Techniques section for some techniques for extracting the address supplied to a call-type opcode from bytecode.

Any call to an externally-owned (non-contract) address should be considered impure. This is because it can potentially have impure code deployed to it.

Impure Opcode Table

Opcode Value Mnemonic Impurity Category
0x31 BALANCE Always Impure
0x32 ORIGIN Always Impure
0x33 CALLER Always Impure
0x3a GASPRICE Always Impure
0x3b EXTCODESIZE Always Impure
0x3c EXTCODECOPY Always Impure
0x40 BLOCKHASH Always Impure
0x41 COINBASE Always Impure
0x42 TIMESTAMP Always Impure
0x43 NUMBER Always Impure
0x44 DIFFICULTY Always Impure
0x45 GASLIMIT Always Impure
0x46 - 0x4F Range of future impure opcodes Future Impure Opcodes
0x54 SLOAD Always Impure
0x55 SSTORE Always Impure
0xf0 CREATE Always Impure
0xff SELFDESTRUCT Always Impure
0xf1 CALL Potentially Impure Call-Type
0xf2 CALLCODE Potentially Impure Call-Type
0xf4 DELEGATECALL Potentially Impure Call-Type
0xfa STATICCALL Potentially Impure Call-Type
* 0xfb CREATE2 Always Impure

* Opcodes which were not implemented at the time of writing, but the author has an expectation they will be implemented in the future.

Address Detection Techniques

Call-type opcodes (see the table for a listing) can only be considered pure if they call a specific set of addresses (see Potentially Impure Call-Types). Therefore, in order to permit some call-type opcodes it is necessary to determine the called address from the bytecode. This section describes methods which may be used to find the address supplied to the call-type opcode with certainty.

The code which may place an address on the stack for call-type opcode can be arbitrarily complex and only discoverable by executing said code. To allow purity checking within a single Ethereum transaction the techniques here are simplistic and will provide false positives (indicating impurity). However, these techniques should never produce false negatives (indicating purity).

Techniques are provided in a Python-like pseudo-code and concrete examples can be found in the two contracts specified in the Background section.

Convenience Functions

First two convenience functions are declared; get_opcode(n) and get_last_opcode_param(n).

Convenience Function get_opcode(n)

Returns the n'th opcode declared in the subject bytecode[].

If n is out of bounds of bytecode[] the function returns None.

Example:

ADD = 0x01
PUSH2 = 0x61

bytecode = [PUSH2, 2, 1, ADD]
get_opcode(0)
# 3
get_opcode(2)
# None

Convenience Function get_last_opcode_param(n)

Returns the final parameter supplied to the n'th opcode declared in the subject bytecode[].

If n is out of bounds of bytecode[] or the n'th opcode does not have parameters the function returns None.

Example:

ADD = 0x01
PUSH2 = 0x61

bytecode = [PUSH2, 2, 1, ADD]
get_last_opcode_param(0)
# 1
get_last_opcode_param(1)
# None
get_last_opcode_param(2)
# None

Address Detection Functions

Four functions are now declared which return an address if a specific pattern of opcodes is found to precede a call-type opcode. If all of these functions return None, then the contract should be assumed to be impure.

Each function takes an input c which is the index of the call-type opcode in question. It is assumed that the on-chain purity checking contract is iterating over the bytecode in question and each time it detects a call-type opcode it runs these functions to attempt to detect the address being called.

Address Detection Function #1

PUSH1 = 0x60
PUSH32 = 0x7f

def address_detector_1(c):
    if PUSH1 <= get_opcode(c-2) <= PUSH32:
        return get_last_opcode_param(c-2)
    else:
        return None

Address Detection Function #2

SUB = 0x03
GAS = 0x5a
PUSH1 = 0x60
PUSH32 = 0x7f

def address_detector_2(c):
    if (get_opcode(c-1) == SUB and
       get_opcode(c-2) == GAS and
       PUSH1 <= get_opcode(c-3) <= PUSH32):
        return get_last_opcode_param(c-3)
    else:
        return None

Address Detection Function #3

GAS = 0x5a
SWAP1 = 0x90

def address_detector_3(c):
    if (get_opcode(c-1) == GAS OR
       get_opcode(c-1) == SWAP1):
        return get_last_opcode_param(c-2)
    else:
        return None

Address Detection Function #4

DUP1 = 0x80
DUP16 = 0x8f

def address_detector_4(c):
    if (DUP1 <= get_opcode(c-1) <= DUP16):
        return get_last_opcode_param(c-2)
    else:
        return None

Opcode Listing

This section contains an opcode-by-opcode listing of each defined opcode. For each opcode the following is provided:

  • Summary: a brief description of what the opcode does.
  • Impurity Reasoning: a reference demonstrating impurity reasoning.
  • Potential Attack: a scenario which assumes some attacker has deployed a contract and wishes to be able to have some pre-determined or ad hoc control of the return result of the contract. This section does not exhaustively list potential attacks, it simply provides an example for demonstrative purposes.

Specifications of opcodes can be found in Appendix H of the Ethereum Yellow Paper.

BALANCE

Summary: Returns the balance of some address.
References: py-evm/evm/vm/logic/context.py: balance()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by altering the balance of some external account.

ORIGIN

Summary: Returns the address of the sender of the transaction which triggered execution. In Solidity, this is tx.origin.
References: py-evm/evm/vm/logic/context.py: origin()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract call by varying the private key with which a transaction is signed.

CALLER

Summary: Returns the address directly responsible for the execution. In Solidity, this is msg.sender.
References: py-evm/evm/vm/logic/context.py: caller()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract call by varying the private key with which a transaction is signed or using an intermediary contract to alter the CALLER value.

GASPRICE

Summary: Returns the current gas price.
References: py-evm/evm/vm/logic/context.py: gasprice()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract call by using some means to alter the gas price (e.g., directly controlling block proposers).

EXTCODESIZE

Summary: Returns the size of the code held at some address.
References: py-evm/evm/vm/logic/context.py: extcodesize()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract.
call by deploying code to some pre-computed address.

EXTCODECOPY

Summary: Copies some amount of code at some address to some position in memory.
References: py-evm/evm/vm/logic/context.py: extcodecopy()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by deploying code to some pre-computed address.

BLOCKHASH

Summary: Returns the hash of some past block (within the previous 256 complete blocks).
References: py-evm/evm/vm/logic/block.py: blockhash()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by controlling some portion of block proposers and selecting block hashes based upon how they will influence the contract call.

COINBASE

Summary: Returns the beneficiary address of the block.
References: py-evm/evm/vm/logic/block.py: coinbase()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by controlling some portion of block proposers and declaring the beneficiary address based upon how it will influence the contract call.

TIMESTAMP

Summary: Returns the timestamp of the block.
References: py-evm/evm/vm/logic/block.py: timestamp()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by controlling some portion of block proposers and declaring the timestamp based upon how it will influence the contract call.

NUMBER

Summary: Returns the number of the block (count of blocks in the chain since genesis).
References: py-evm/evm/vm/logic/block.py: number()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by selecting in which block a transaction should be included.

DIFFICULTY

Summary: Returns the block difficulty.
References: py-evm/evm/vm/logic/block.py: difficulty()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by assuming some control of the collective hash rate and modifying it based upon how it will influence the contract call.

GASLIMIT

Summary: Returns the block gas limit.
References py-evm/evm/vm/logic/block.py: gaslimit()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by using some means to alter the gas limit (e.g., directly controlling block proposers or spamming the network).

SLOAD

Summary: Returns a word from storage.
References py-evm/evm/vm/logic/storage.py: sload()
Impurity Reasoning: reads state.
Potential Attack: At the time of writing the author is not aware of any attack using SLOAD if all other purity directives are followed. However, attacks could be imagined if combined with the SLOAD opcodes (other attacks may be possible).

SSTORE

Summary: Saves some word to storage.
References: py-evm/evm/vm/logic/storage.py: sstore()
Impurity Reasoning: reads and mutates state.
Potential Attack: At the time of writing the author is not aware of any attack using SSTORE if all other purity directives are followed. However, attacks could be imagined if combined with the SSTORE or GAS opcodes (other attacks may be possible).

CREATE

Summary: Creates a new account given some code.
References: py-evm/evm/vm/logic/system.py: Create.__call__()
Impurity Reasoning: reads and mutates state.
Potential Attack: At the time of writing the author is not aware of any attack using CREATE if all other purity directives are followed. However, attacks could be imagined if combined with the EXTCODESIZE opcode (other attacks may be possible).

SELFDESTRUCT

Summary: Registers the account for deletion, sending remaining Ether to some address.
References: py-evm/evm/vm/logic/system.py: _selfdestruct()
Impurity Reasoning: reads and mutates state.
Potential Attack: An attacker may self-destruct a contract, causing all future calls to it to fail.

CALL

Summary: Message-calls to some address.
References: py-evm/evm/vm/logic/call.py: Call()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may call an impure contract and use its return data.

CALLCODE

Summary: Execute the code of some other account using the state of this account.
References: py-evm/evm/vm/logic/call.py: CallCode()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may callcode an impure contract and read or mutate state.

DELEGATECALL

Summary: Execute the code of some other account using the state of this account whilst retaining the same values for sender and value.
References: py-evm/evm/vm/logic/call.py: DelegateCall()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may delegate an impure contract and read or mutate state.

STATICCALL

Summary: Message-calls to some address without persisting state modifications.
References: py-evm/evm/vm/logic/call.py: StaticCall()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may call an impure contract and use its return data.

CREATE2

This opcode has not been implemented at the time of writing.

Summary: Creates a new account given some code and some nonce (as opposed to CREATE which uses the current account nonce).
References: EIP86.
Impurity Reasoning: Reads and mutates state.
Potential Attack: An attacker could craft a contract which succeeds the first time it is called, but fails all other times.